CN113971100A - Method for monitoring at least one computing unit - Google Patents

Method for monitoring at least one computing unit Download PDF

Info

Publication number
CN113971100A
CN113971100A CN202110830471.5A CN202110830471A CN113971100A CN 113971100 A CN113971100 A CN 113971100A CN 202110830471 A CN202110830471 A CN 202110830471A CN 113971100 A CN113971100 A CN 113971100A
Authority
CN
China
Prior art keywords
monitoring module
error
error measure
level
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110830471.5A
Other languages
Chinese (zh)
Inventor
A·沃格尔
M·穆勒
R·葛文纳
R·戴博尔特
E·玛格尔
M·欧佩兰德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Robert Bosch GmbH
Original Assignee
Robert Bosch GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch GmbH filed Critical Robert Bosch GmbH
Publication of CN113971100A publication Critical patent/CN113971100A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • G06F11/0724Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU] in a multiprocessor or a multi-core unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0784Routing of error reports, e.g. with a specific transmission path or data flow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes

Abstract

The invention relates to a method for monitoring at least one computing unit in which a plurality of processes are implemented, the computing unit having at least one subordinate software monitoring module, a superordinate software monitoring module of the at least one subordinate software monitoring module and a superordinate hardware monitoring module of the superordinate software monitoring module, wherein the at least one subordinate software monitoring module monitors at least one process implemented in the at least one computing unit, wherein the superordinate software monitoring module monitors the at least one subordinate software monitoring module and wherein the hardware monitoring module monitors the superordinate software monitoring module, wherein a predefined error measure is executed if an error is recognized by the at least one subordinate software monitoring module or the superordinate hardware monitoring module during the respective monitoring, wherein a check is made whether the predefined error measure is executed correctly, and if it is recognized during the checking that the predefined error measure was not executed correctly, a superordinate error measure is executed.

Description

Method for monitoring at least one computing unit
Technical Field
The present invention relates to a method for monitoring at least one computing unit, to a system comprising computing units, and to a computer program for carrying out the method.
Background
In order to monitor the computing unit or the processes carried out by the computing unit, monitoring modules, for example so-called watchdog modules, can be provided, which can recognize errors by data exchange, for example in the form of inquiry/reply communications. For example, for this purpose, it is possible to check by such a watchdog: whether the data exchange or the obtained answers are correct and whether these answers arrive at the correct point in time or within a predefined time window. If this is not the case, errors of the computing unit or of the corresponding process can be deduced back. Such a monitoring module or watchdog may be implemented in hardware as a separate hardware element or may also be implemented in software for execution by a processor unit.
DE 102018210733 a1 discloses a system of monitoring modules, which are divided into different hierarchical levels. In the top level, a top monitoring module is provided as a hardware monitoring module, such as a hardware watchdog. At least one software monitoring module, for example a software watchdog, is respectively arranged in at least one lower level below the top level. Each software monitoring module of the hierarchy monitors at least one assigned process and is in turn monitored by an upper monitoring module of an upper hierarchy. If the software monitoring module identifies an error of the respective assigned process, a predefined measure, for example an error reaction, is carried out in order to eliminate the identified error. For example, a reset, a restart or a shutdown of the respective process or of the respective computing unit can be carried out as such an error reaction.
Disclosure of Invention
Against this background, a method for monitoring at least one computing unit in which a plurality of processes are implemented is proposed, as well as a computing unit, a system composed of computing units and a computer program for carrying out the method, having the features of the independent patent claim. Advantageous embodiments are the subject matter of the dependent claims and the subsequent description.
At least one subordinate, process-specific or application-specific software monitoring module monitors at least one process implemented in the at least one computing unit. Such a subordinate software monitoring module may be implemented, for example, by a processor or a processor core of a corresponding computing unit. For example, such lower level software monitoring modules may be provided separately for different domains of the computing unit. The subordinate software monitoring module is in particular embedded in the normal program run of the respective computing unit.
The superior software monitoring module of the at least one inferior software monitoring module monitors the at least one inferior software monitoring module. In particular, for each computing unit, a subordinate software monitoring module can be provided which monitors all subordinate software monitoring modules implemented in the computing unit. The upper-level software monitoring module may be provided, for example, in a secure computing unit ("secure backbone"), in particular a secure processor or a secure processor core. The upper-level software monitoring module is in particular provided as a software monitoring module specific to the computing unit or as a central internal software monitoring module.
The superior hardware monitoring module of the superior software monitoring module monitors the superior software monitoring module. The hardware monitoring module is in particular provided as an external hardware module, independent of the at least one computing unit. The hardware monitoring module monitors in particular a plurality of corresponding higher-level software monitoring modules, which are each provided for monitoring a plurality of lower-level software monitoring modules in the respective computing unit.
Each of these software or hardware monitoring modules is in particular provided as a corresponding watchdog. For example, each of these monitoring modules may perform a respective monitoring by means of an inquiry/answer communication or a pass/fail check.
If an error is detected by the at least one subordinate software monitoring module or the superordinate hardware monitoring module during the respective monitoring operation, a predefined error measure is carried out, for example, an error reaction to eliminate the detected error. Such an error may be an error or a malfunction of the implemented process or also of one of the monitoring modules. For example, the error measure can be executed by the monitoring module that has identified the corresponding error or by a superordinate monitoring module of this module.
Within the framework of the method, it is checked whether a predefined error measure has been carried out correctly. If, during this check, it is recognized that the predefined error measure was not executed correctly, a superordinate error measure is executed. Such an incorrectly executed error measure may indicate, for example, an error of the corresponding monitoring module executing the error measure. Likewise, error measures that are not performed correctly may indicate, among other things: the corresponding identified error cannot be eliminated or suppressed as desired by error measures, for example because the error is more serious or complicated than originally assumed.
Therefore, the method proposes: a multi-stage system consisting of superordinate monitoring modules or watchdog is used for multi-stage error reactions or multi-stage monitoring of error reactions. On the one hand, the monitoring module system enables a multi-level monitoring of the computing unit or of the processes implemented in the computing unit and also of whether, in the case of an identified error, the error can be reacted correctly. Therefore, the method can realize that: if the error measures are not performed correctly or if the error measures are, for example, insufficient to eliminate the recognized error, a reaction can be made individually and quickly. Thus, the security and integrity of the computing unit or computing unit system may be improved.
In the case of an identified error in the conventional monitoring concept, a corresponding error measure is usually carried out. In this case, the error measure is mostly not monitored further. If the error measure is not implemented correctly or if the identified error cannot be sufficiently reacted to by the error measure, this can usually be identified, if at all, only if a new error is detected. In contrast, the present method enables: the performed error measures are explicitly monitored and may be reacted separately if some error measures are not performed correctly.
Particularly advantageously, the method is an extension of the monitoring concept shown in DE 102018210733 a 1. As stated at the outset, DE 102018210733 a1 relates to a system of hardware and software monitoring modules which are divided into different levels. The method particularly advantageously extends the monitoring concept shown in DE 102018210733 a1 to a multi-stage error monitoring or to a multi-stage monitoring of error measures. The system according to the method, which is composed of hardware and software monitoring modules located at a higher level than one another, can be designed in particular according to such a monitoring module system of DE 102018210733 a 1. The superordinate monitoring modules to one another according to the method can be designed in particular as a system according to DE 102018210733 a1, which is composed of monitoring modules in three levels or three superordinate entities to one another. For a detailed description of such a system consisting of monitoring modules at different levels, reference should be made in particular to DE 102018210733 a1, the disclosure of which is also referred to as the content of the present application.
In an advantageous embodiment, it is checked whether the upper-level error measure was executed correctly. If it is recognized during this check that the upper-level error measure was not executed correctly, the upper-level error measure or a further upper-level error measure is advantageously executed again and it is checked again whether the further upper-level error measure was executed correctly. In particular, each executed error measure is therefore checked, and a further error measure is expediently executed in each case if the error measure is not executed correctly. In particular, therefore, the upper-level error measures can be executed in a cascaded manner, so that individual reactions to incorrectly executed error measures can be made. In particular, further higher-level error measures can be executed as long as necessary, until the last, highest-level error measure is executed, for example, the entire processing unit or the entire system of computing units is deactivated.
Preferably, the predefined error measure is executed as a function of the error detected and a predefined error response time. Thus, depending on the severity or extent of the respective error, the error may be responded to more or less quickly. For example, in the case of a safety-critical process having an error, it is possible to react with a short error reaction time, so that the error can be eliminated as quickly as possible.
Preferably, if a predefined error measure is not executed correctly and/or if a further higher-level error measure is not executed correctly, the corresponding higher-level error measure is executed according to the incorrectly executed error measure with the corresponding predefined error reaction time. Thus, a separate error reaction time can be utilized to react to an incorrect execution of an error measure. It is therefore expedient to be able to implement special reactions to different types of errors or error sources, so that individual error reaction times can be used to react to individual errors of a process, a monitoring module or an error reaction.
Preferably, the monitoring module which identifies the error and/or a superordinate monitoring module of the monitoring module checks whether a predefined error measure has been executed correctly. Thus, the error measure may be performed by the same entity or in the same hierarchy at which the potential error was identified, or suitably also by a superior entity or in a superior hierarchy.
Preferably, the monitoring module which identifies that the predefined error measure was not executed correctly and/or the superordinate monitoring module of this monitoring module executes the superordinate error measure. In the case of an incorrect error measure, the corresponding upper-level error measure can therefore be implemented by the same entity that also implements the error measure or in the same hierarchy that also implements the error measure, or can be implemented appropriately by the upper-level entity or in the upper-level hierarchy.
According to a particularly advantageous embodiment, if an error is detected in a process, the process identified as having the error is deactivated or restarted as a predetermined error measure. The subordinate software module to which the process is assigned can in particular execute itself or cause the deactivation or the restart.
Preferably, it is checked whether the deactivation of the process identified as having an error is carried out correctly as an error measure, by checking whether the transmission carried out in the course of carrying out the process without error is no longer carried out. The process can transmit, in particular, without errors, special data, values, messages or the like to other processors or components of the computing unit. If the process now has errors and should be deactivated, the subordinate software module, which is assigned to the process in particular, checks whether the transmissions have indeed ceased. If these transmissions continue to be sent out, this indicates: the process is not disabled and the error measures are therefore not performed correctly.
Preferably, if it is recognized that the shutdown or restart of a process identified as having an error has not been correctly performed as an error measure, the hardware component implementing the process identified as having an error is shutdown or restarted as an upper-level error measure. The hardware component may be, in particular, a processor or a processor core of a corresponding computing unit or, where appropriate, also the entire computing unit. In particular, the hardware component also implements a subordinate software monitoring module assigned with an error process. In addition, the hardware component can also suitably implement a corresponding upper-level software monitoring module. Suitably, the lower level software monitoring module notifies its upper level software monitoring module and/or hardware monitoring module of an error measure that has not been correctly performed. Expediently, the upper-level software or hardware monitoring module can also be indirectly informed of an incorrectly executed error measure, for example when an error counter reaches a threshold value. The upper-level software and/or hardware monitoring module then expediently executes upper-level error measures and deactivates or restarts the hardware component. Particularly suitably, the hardware monitor module performs the restart or the shutdown of the hardware component as a top-level entity.
According to a preferred embodiment, the at least one subordinate software monitoring module is connected to the superordinate software monitoring module via a first interface. Alternatively or additionally, the upper level software monitoring module is preferably connected to the upper level hardware monitoring module via a second interface. These interfaces may in particular be implemented in software and/or in hardware, respectively. The first interface between the lower software monitoring module and the upper software monitoring module is expediently designed as a software interface. The second interface between the upper-level software monitoring module and the upper-level hardware monitoring module is expediently designed as a hardware interface. The first interface and the second interface are in particular standardized interfaces. The monitoring modules are expediently designed for the standardized interfaces, so that a flexible exchange of the individual monitoring modules is possible.
Advantageously, the at least one subordinate software monitoring module is exchanged via the first interface as an error measure and/or as a superordinate error measure. Via the first interface, the respective exchanged subordinate software monitoring module can be connected to the superordinate software monitoring module. Alternatively or additionally, it is preferred that the upper level software monitoring module is exchanged via the first interface and the second interface. Via these two interfaces, the upper software monitoring module can be flexibly connected to the lower software monitoring module and to the upper hardware monitoring module. Alternatively or additionally, the upper-level hardware monitoring module is preferably exchanged via the second interface. The hardware monitoring module can be connected with the corresponding upper-level software monitoring module via the second interface. Thus, these standardized interfaces enable: other processes and software monitoring modules are embedded or switched to another hardware monitoring module in a cost-effective manner.
Particularly preferably, the method is suitable for use in the automotive field. The computing unit to be monitored can in particular be designed as a control unit in the (motor) vehicle or as a microcontroller or microprocessor in the (motor) vehicle, respectively. The processes to be monitored, which are carried out by the computing unit, can be, in particular, safety-critical functions which are carried out for the safe operation of the vehicle and for controlling the vehicle, for example during engine control or during driving assistance functions or the like. By monitoring such a computing unit within the framework of the method, Safety requirements in the (motor) vehicle field can be met, in particular, as specified, for example, in the ISO 26262 standard or, in particular, by the so-called Safety Integrity Level (ASIL), i.e., the Safety requirement Level specified by ISO 26262 for Safety-relevant systems in the motor vehicle.
The computing unit according to the invention or the system according to the invention, which is composed of at least two such computing units, is set up in a program-technical manner in particular to carry out the method according to the invention. Particularly preferably, the computing unit or the system composed of the computing unit is implemented into a control device of a motor vehicle.
The implementation of the method according to the invention in the form of a computer program or a computer program product with program code for carrying out all method steps is also advantageous, in particular when the control device which carries out the method is also used for other tasks and is therefore always present, since this results in particularly low costs. Data carriers suitable for providing the computer program are, in particular, magnetic, optical and electronic memories, such as hard disks, flash memories, EEPROMs, DVDs and others. It is also feasible to download the program via a computer network (internet, intranet, etc.).
Further advantages and embodiments of the invention emerge from the description and the drawing.
The invention is schematically illustrated in the drawings and will be described below with reference to the drawings according to embodiments.
Drawings
Fig. 1a to 1d schematically show control devices of a vehicle, which are each set up to carry out a preferred embodiment of the method according to the invention.
Fig. 2 schematically shows a preferred embodiment of the method according to the invention as a block diagram.
Detailed Description
In fig. 1a, a control device 100 of a (motor) vehicle, for example for implementing a driving assistance function, is schematically shown.
The control device 100 has a microcontroller 110, which in turn has a plurality of processor cores 111, 112. In each processor core 111, 112, a plurality of processes 131, 132, 141, 142, respectively, are implemented. Furthermore, a secure computation unit 113 configured as a secure kernel is provided in the microcontroller 110.
In order to monitor the control unit 100 for errors, a system 200 is provided, which is composed of a monitoring module or watchdog and has a plurality of lower-level software monitoring modules 231, 232, an upper-level software monitoring module 221 and an upper-level hardware monitoring module 211.
In each of the processor cores 111, 112, a subordinate, process-or application-specific software monitoring module 231, 232 is provided. These subordinate software monitoring modules 231, 232 are each implemented as a program or a process by the respective processor core 111, 112 and are embedded in the program execution of this processor core. Each of these software monitoring modules 231, 232 monitors the processes 131, 132, 141, 142 implemented in the respective processor cores 111, 112.
An upper-level internal central software monitoring module 221 is also provided, which is implemented, for example, as a program or process in the secure computing unit 113 of the microcontroller 110. Each lower software monitoring module 231, 232 is connected to and monitored by the upper software monitoring module 221 via a first interface 261, 262.
The upper-level hardware monitoring module 211 is provided as an external hardware unit that is independent of the control device 100. The upper level software monitoring module 221 is connected to the upper level hardware monitoring module 211 via the second interface 251 and is monitored by the upper level hardware monitoring module.
Furthermore, the control device 100 can also be designed as a system consisting of a plurality of computing units and, for example, comprise a plurality of microcontrollers, as is schematically illustrated in fig. 1b, 1c and 1 d.
In the example of fig. 1b, 1c and 1d, the control device 100 also has a second microcontroller 120, in turn having a plurality of processor cores 121, 122, in each of which a plurality of processes 151, 152, 161, 162 are respectively implemented in each of these processor cores 121, 122.
In these processor cores 121, 122, a subordinate, process-specific or application-specific software monitoring module 233, 234 is also provided in order to monitor the processes 151, 152, 161, 162 implemented in the respective processor core 121, 122.
Furthermore, in the example of fig. 1b, an upper software monitoring module 221 implemented in the secure computation unit 113 of the first microcontroller 110 is also provided for monitoring lower software monitoring modules 233, 234 of the second microcontroller 120. For this purpose, the lower software monitoring modules 233, 234 are connected to the upper software monitoring module 221 via corresponding interfaces 263, 264, respectively, so that the monitoring module 221 can communicate directly with the lower modules 233, 234 via the inter-processor communication 115.
Furthermore, it is also conceivable: in the second microcontroller 120, an own upper software monitoring module 222 is provided, as shown in fig. 1 c. For example, for this purpose, the second microcontroller 120 also has a secure computation unit 123, which is designed as a secure kernel and in which the upper-level software monitoring module 222 is implemented as a program or process. The subordinate software monitoring modules 233, 234 are connected to the upper level software monitoring module 222 via corresponding interfaces 263, 264, respectively, and are monitored by the upper level software monitoring module. In addition, the upper level software monitoring module 222 of the second microcontroller 120 remains connected to the upper level software monitoring module 221 of the first microcontroller 110 via the inter-processor communication 115. Furthermore, in this case, the upper software monitoring module 221 of the first microcontroller 110 is provided in particular for relaying or coordinating the communication between the upper software monitoring module 222 and the upper hardware monitoring module 211 of the second microcontroller 120.
Alternatively, it is also conceivable: the upper level software monitoring module 222 and the hardware monitoring module 211 of the second microcontroller 120 can communicate directly with each other, as shown in fig. 1 d. In this case, the software monitoring module 222 is directly connected to the hardware monitoring module 211 via a corresponding interface 252. In this case, inter-processor communication between the upper-level software monitoring modules 221 and 222 of the two microcontrollers 110, 120 is not necessary for communication between the software monitoring module 222 and the hardware monitoring module 211. It is easy to understand that: however, the microcontrollers 110 and 120 may still be connected to each other via inter-processor communication.
Thus, in the example of fig. 1b, 1c and 1d, the hardware monitoring module 211 is common superordinate to all superordinate software monitoring modules 221, 222 of the two microcontrollers 110, 120 and also common superordinate to all subordinate software monitoring modules 231, 232, 233, 234 of the two microcontrollers 110, 120. Alternatively, it is also conceivable to use a second hardware monitoring module, such that the first hardware monitoring module is assigned to the first microcontroller 110 and the second hardware monitoring module is assigned to the second microcontroller 120.
Furthermore, via the interfaces 251, 252, 261, 262, 263, 264, it is possible to exchange individual ones of these monitoring modules flexibly and individually if necessary.
It is also easy to understand that: the control device 100 may also have other components, such as other microcontrollers or microprocessors, and such as analog-to-digital converters (ADCs), digital-to-analog converters (DACs), input/output (E/as), and so forth.
With the system 200, as shown in fig. 1a, 1b, 1c and 1d, which is composed of superordinate monitoring modules or watchdog modules with respect to one another, a multi-level monitoring of the control device 100 and also a multi-level error response or a multi-level monitoring of the error response can be achieved. For this purpose, the control device 100 is set up to: the execution of preferred embodiments of the method according to the invention or preferred embodiments of the method according to the invention is monitored.
A preferred embodiment of the method according to the invention is represented in fig. 2 as a block diagram and is subsequently explained, by way of example, with respect to fig. 2 with respect to the control device shown in fig. 1 c.
In step 301, the control device 100 operates normally, and the respective processor cores 111, 112, 121, 122 implement their respective processes 131, 132, 141, 142, 151, 152, 161, 162.
In step 302, multi-level monitoring of the control device 100 is performed. In this case, the respective lower software monitoring module 231, 232, 233, 234 monitors the respective process 131, 132, 141, 142, 151, 152, 161, 162. The upper level software monitoring modules 221, 222 in turn monitor the lower level software monitoring modules 231, 232, 233, 234, and the hardware monitoring module 211 monitors these upper level software monitoring modules 221, 222. These individual monitoring can be carried out, for example, by means of a respective challenge/response communication.
For example, tasks at different levels of ASIL may be implemented in different processor cores 111, 112, 121, 122 as processes 131, 132, 141, 142, 151, 152, 161, 162, respectively. In this case, the individual tasks in the different processor cores 111, 112, 121, 122 are each processed, for example, by means of a ring buffer and monitored by the respective subordinate software monitoring modules 231, 232, 233, 234. The size of the individual ring memories can be selected in each case in particular as a function of the category (master) of the respective subordinate software monitoring module 231, 232, 233, 234 and also in particular as a function of the fault tolerance time of the respective task.
The lower software monitoring modules 231, 232, 233, 234, which are application-specific, in particular continuously read the respective ring memories and check the correctness and the realism of the entries of these ring memories. Furthermore, the upper level software monitoring modules 221, 222 suitably check the correctness and realism of the lower level software monitoring modules 231, 232, 233, 234.
Suitably, the ring memories of the different processor cores 111, 112, 121, 122 may be distinguished application-specifically and ASIL-specifically, respectively, so that there is no interference or influence, in particular, between the different applications and the different ASIL levels of the applications.
In step 303, an error is identified. For example, the lower level software monitoring module 231 recognizes that: the process 131 in the processor core 111 is no longer properly implemented.
In step 304, a predefined error measure is executed according to a predefined error response time in order to eliminate the error detected. For example, the lower level software monitoring module 231 performs a restart of the faulty process 131 for this purpose.
In step 305, it is checked whether the predefined error measure was executed correctly. For this purpose, the lower level software monitoring module 231 checks whether the faulty process 131 has been restarted and is now functioning correctly. If this is the case, normal operation 301 of the control device 100 continues.
However, if it is recognized during the check 305 that the predefined error measure was not executed correctly, the higher-level, second error measure is executed in step 306 according to a predefined second error reaction time. If, for example, the subordinate software monitoring module 231 identifies in step 305 that the faulty process 131 cannot be restarted, this is reported to the superordinate software monitoring module 221, which executes a restart of the processor core 111 in step 306 or deactivates the faulty process 131 as a superordinate, second fault measure.
In step 307, it is now monitored whether the upper-level error measure has been executed correctly. For this purpose, the upper level software monitoring module 221 monitors, for example: whether processor core 111 has been restarted or whether the deactivation of process 131 was successful. If this is the case, normal operation 301 of the control device 100 is resumed.
However, if it is recognized during this check 307 that the superordinate, second error measure was not executed correctly, the superordinate, third error measure is executed again with a predefined third error response time.
In this case, the upper level software monitoring module 221 reports to the hardware monitoring module 211: processor core 111 cannot be restarted or process 131 cannot be disabled. The report can also be implemented, for example, as follows: the queries within the framework of communicating with the hardware monitoring module's queries/responses are not answered correctly.
If, for example, an error counter or challenge/response counter within the framework of the challenge/response communication reaches a threshold value, the hardware monitoring module 211 deactivates the output, output stage and interface of the control device 100 in step 308 as a higher-level, third error measure. Additionally, a restart of the control device 100 may be triggered, for example, by the hardware monitoring module 211. In this case, the deactivation of the outputs, output stages and interfaces of the control device 100 is expediently continued after a restart and can particularly expediently only be reversed by the hardware monitoring module 211.
In step 309, it is then checked whether the error counter has decreased and has fallen below a predefined threshold value again. If this is not the case, this indicates that: errors in process 131 still exist. The outputs, output stages and interfaces of the control device 100 remain deactivated.
However, if the error counter falls below the threshold again, this indicates that process 131 is functioning again without error. In this case, the hardware monitoring module 211 activates the output, the output stage, and the interface of the control device 100 in step 310. Then, the normal operation 301 of the control apparatus 100 may be resumed.
The method enables a multi-stage monitoring of the computation unit 100 and a multi-stage error reaction or a multi-stage monitoring of an error reaction to be carried out by means of a system 200 of monitoring modules. Thus, it is possible to react individually to an error measure that has not been performed correctly with an individual error reaction time.

Claims (14)

1. A method for monitoring at least one computing unit (100, 110, 120) in which a plurality of processes (131, 132, 141, 142, 151, 152, 161, 162) are implemented,
the computing unit comprises at least one lower software monitoring module (231, 232, 233, 234), an upper software monitoring module (221, 222) of the at least one lower software monitoring module (231, 232, 233, 234), and an upper hardware monitoring module (211) of the upper software monitoring module (221, 222),
wherein the at least one lower level software monitoring module (231, 232, 233, 234) monitors at least one process (131, 132, 141, 142, 151, 152, 161, 162) implemented in the at least one computing unit, wherein the upper level software monitoring module (221, 222) monitors the at least one lower level software monitoring module (231, 232, 233, 234) and wherein the hardware monitoring module (211) monitors (302) the upper level software monitoring module (221, 222),
wherein a predefined error measure is executed (304) if the at least one lower-level software monitoring module (231, 232, 233, 234) or the upper-level software monitoring module (221, 222) or the upper-level hardware monitoring module (211) recognizes (303) an error during the respective monitoring,
wherein it is checked whether the predefined error measure was executed correctly (305), and if it is recognized during the checking that the predefined error measure was not executed correctly, a superordinate error measure is executed (306).
2. Method according to claim 1, wherein it is checked (307) whether the upper level error measure is executed correctly, wherein if it is identified during the checking that the upper level error measure is not executed correctly, the upper level error measure is executed (308) again, and wherein it is checked again whether the upper level error measure is executed correctly (309).
3. Method according to claim 1 or 2, wherein the predefined error measure is executed according to a predefined error reaction time on the basis of the identified error.
4. Method according to one of the preceding claims, wherein if the predefined error measure is not executed correctly, the corresponding upper-level error measure is executed according to the incorrectly executed error measure with the corresponding predefined error reaction time.
5. Method according to one of the preceding claims, wherein the monitoring module which identifies the error and/or a superordinate monitoring module of the monitoring modules checks whether the predefined error measure is executed correctly.
6. Method according to one of the preceding claims, wherein a monitoring module which identifies that the predefined error measure was not executed correctly and/or a superordinate monitoring module of the monitoring module executes the superordinate error measure.
7. The method as claimed in one of the preceding claims, wherein if an error of a process (131) is identified (303), the process (131) identified as having the error is deactivated or restarted (304) as a predefined error measure.
8. Method according to claim 7, wherein it is checked whether the deactivation of the process (131) identified as having an error is performed correctly as an error measure (305) by checking whether the transmission performed in the course of the process (131) being implemented without error is no longer performed.
9. The method according to claim 7 or 8, wherein if a shutdown or restart of a process (131) identified as having an error is identified as not being performed correctly as an error measure (305), a hardware component (111) implementing the process (131) identified as having an error is shutdown or restarted (306) as an upper-level error measure.
10. The method according to one of the preceding claims, wherein the at least one subordinate software monitoring module (231, 232, 233, 234) is connected with the superior software monitoring module (221, 222) via a first interface (261, 262, 263, 264) and/or wherein the superior software monitoring module (221, 222) is connected with the superior hardware monitoring module (211) via a second interface (251, 252).
11. Method according to claim 10, wherein the at least one subordinate software monitoring module (231, 232, 233, 234) is exchanged via the first interface (261, 262, 263, 264) as an error measure and/or as an upper-level error measure; and/or the upper level software monitoring module (221, 222) is exchanged via the first interface (251, 252) and the second interface (261, 262, 263, 264); and/or the upper level hardware monitoring module (211) is exchanged via the second interface (251, 252).
12. A computing unit or a system of at least two computing units (100, 110, 120) having means for performing the method according to any of the preceding claims.
13. A computer program which, when executed on a computing unit, causes the computing unit to perform all the method steps of the method according to any one of claims 1 to 11.
14. A machine readable storage medium having stored thereon a computer program according to claim 13.
CN202110830471.5A 2020-07-22 2021-07-22 Method for monitoring at least one computing unit Pending CN113971100A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020209228.1 2020-07-22
DE102020209228.1A DE102020209228A1 (en) 2020-07-22 2020-07-22 Method for monitoring at least one computing unit

Publications (1)

Publication Number Publication Date
CN113971100A true CN113971100A (en) 2022-01-25

Family

ID=79179114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830471.5A Pending CN113971100A (en) 2020-07-22 2021-07-22 Method for monitoring at least one computing unit

Country Status (2)

Country Link
CN (1) CN113971100A (en)
DE (1) DE102020209228A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115904850B (en) * 2023-01-09 2023-05-12 深流微智能科技(深圳)有限公司 Power-on detection method of multi-core processor, readable storage medium and GPU

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102009038434A1 (en) 2009-08-21 2011-02-24 Delphi Delco Electronics Europe Gmbh Processor system for controlling multiple functional components of vehicle, has processor for controlling functional components of component group and another processor for controlling functional components of another component group
DE102018210733A1 (en) 2018-06-29 2020-01-02 Robert Bosch Gmbh Method for monitoring at least one computing unit

Also Published As

Publication number Publication date
DE102020209228A1 (en) 2022-01-27

Similar Documents

Publication Publication Date Title
US20130067465A1 (en) Distributed computing architecture with dynamically reconfigurable hypervisor nodes
CN110062918B (en) Method for updating software in a cloud gateway, computer program for carrying out said method and processing unit for carrying out said method
CN101779193B (en) System for providing fault tolerance for at least one micro controller unit
RU2284929C2 (en) Method to control component of distributed system important for provision of safety
US10384689B2 (en) Method for operating a control unit
CN108154230B (en) Monitoring method and monitoring device of deep learning processor
JP2014507034A (en) Semiconductor circuit and method for safety concept for use in vehicles
US8392815B2 (en) Method for the operation of a microcontroller and an execution unit and microcontroller and an execution unit
CN113971100A (en) Method for monitoring at least one computing unit
US7788533B2 (en) Restarting an errored object of a first class
US6526527B1 (en) Single-processor system
JP5295251B2 (en) Vehicle control unit with microcontroller monitored power supply voltage and related method
EP3869338A1 (en) A vehicle safety electronic control system
Li et al. Optimizations of an application-level protocol for enhanced dependability in FlexRay
TW202122997A (en) Controller
US7502973B2 (en) Method and device for monitoring a distributed system
EP3115900A1 (en) A computer system and a method for executing safety-critical applications using voting
JP2017043166A (en) Vehicle control device
CN114355760A (en) Main control station and hot standby redundancy control method thereof
US20210278815A1 (en) Automation System For Monitoring A Safety-Critical Process
EP4275123A1 (en) Program flow monitoring for gateway applications
KR102290796B1 (en) Method of automatically restarting ecu upon occurrence of lin communicatin errors
CN107179980B (en) Method for monitoring a computing system and corresponding computing system
US11422878B2 (en) Control unit and method for operating a control unit
US8037353B2 (en) Method for operating a system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination