EP1854007A2

EP1854007A2 - Method, operating system, and computing device for processing a computer program

Info

Publication number: EP1854007A2
Application number: EP05777777A
Authority: EP
Inventors: Reinhard Weiberle; Bernd Mueller; Werner Harter; Thomas Kottke; Yorck Collani; Rainer Gmehlich
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2004-08-04
Filing date: 2005-07-25
Publication date: 2007-11-14
Also published as: RU2431182C2; JP4728334B2; WO2006015945A2; WO2006015945A3; DE102004037713A1; BRPI0513229A; US20090217090A1; CN1993679B; RU2007106437A; US7890800B2; JP2008508626A; CN1993679A

Abstract

The invention relates to a method for processing a computer program (23) on a computing device (20), particularly a microprocessor (22). Said computer program (23) comprises several program objects that are configured as tasks, for example. Transient and permanent errors are detected during the time the computer program (23) is processed on the computing device (20). In order to be able to constructively treat transient errors when the same occur in a computer system such that the operability and the operational safety of the computer system are reestablished within the shortest possible error tolerance interval, a program object that has already been fed for processing is converted into a defined state and is restarted from said defined state when an error is detected. Said program object represents a run time object of the computer program, a task, for example. According to the inventive method, one or several tasks which continue to be processed or have already been processed when the error occurs can be restarted and executed.

Description

Method, operating system and computing device for executing a computer program

The present invention relates to a method for processing a computer program on a computing device, in particular on a microprocessor. The computer program includes several program objects. In the method, errors are detected during the execution of the computer program on the computing device.

The invention also relates to an operating system that is executable on a computing device, in particular on a microprocessor.

Finally, the present invention also relates to a computing device for processing a computer program comprising a plurality of program objects. The computing device includes an error detection mechanism for detecting an error during execution of the computer program on the computing device. State of the art

During the execution of a computer program on a computing device, so-called transient errors can occur. As the structures on the semiconductor devices

(so-called chips) smaller and smaller, the clock rate of the signals but ever larger and the voltages of the signals are always lower, transient errors occur more frequently. In contrast to permanent errors, transient errors are only temporary and usually disappear after some time. In the case of transient errors, only individual bits are wrong without the computing device per se being permanently damaged. Transient faults can have different causes, such as electromagnetic influences, alpha particles or neutrons.

In communication systems, the focus in error handling is already on transient errors. In communication systems (e.g.

Controller Area Network, CAN), it is known to resend the erroneously transmitted data when detecting an error. In addition, it is known to use an error counter in communication systems, which is increased upon detection of an error, is lowered on correct transmission, and prevents transmission of data as soon as it exceeds a certain value.

In computing devices for processing computer programs, however, an error treatment is essentially only for permanent errors. A consideration of transient errors is limited to the incrementing and possibly decrementing of an error counter. This is stored in a memory and can off-line, that is, for example, as a motor vehicle control unit trained computing device during a

Workshop stay, are read out as diagnostic or error information. Only then can be reacted to the error accordingly.

Error handling by means of an error counter therefore on the one hand does not allow any error handling within a short error tolerance time required in particular for security-relevant systems and on the other hand also no constructive error treatment in the sense that the computer program is again processed properly within the fault tolerance time. Instead, in the prior art, the computer program is switched to an emergency mode after exceeding a certain value of the error counter. This means that instead of the erroneous part of the computer program another is processed and the replacement values determined in this way are used for the further calculation. For example, the substitute values can be modeled based on other quantities. Alternatively, the results calculated with the faulty part of the computer program can be rejected as faulty and replaced by standard values for further calculation intended for run-flat operation. The known methods for handling a transient error of a computer program running on a computing device thus do not allow a systematic, constructive handling of the transient nature of most errors.

It is also known from the prior art to eliminate transient errors occurring during the execution of a computer program on a computing device by a complete restart of the computing device. Also, this solution can not really satisfy, as in the previous course of the processing of the computer program won Sizes are lost and the computing device for the duration of the restart can not fulfill its intended function. This is unacceptable especially in safety-relevant systems.

Finally, it is also known as error handling for transient errors of a processed on a computer program computer program to reset the computer program by a few clocks and repeat individual machine commands of the computer program. This process is also known as micro roll-back. In the known method, only objects at the machine level (clocks, machine commands) are jumped back. This requires appropriate hardware support at the machine level, which is associated with a considerable effort in the area of the computing device. An execution of the known method purely controlled by software is not possible.

The known from the prior art error handling mechanisms can transient errors that in the

Processing a computer program on a computing device may not respond appropriately.

The present invention is based on the object, in the event of transient errors when working a computer program in a computer system to treat this constructive so that the full functionality and reliability of the computer system is restored within the shortest possible fault tolerance time.

To solve this problem, it is proposed on the basis of the method of the type mentioned above that, when a fault is detected, at least one program object, which has already been fed to a processing, into one defined state and is restarted from this.

Advantages of the invention

The program object that is restarted does not have to be completely executed when the error was detected. For the purposes of the invention, such program objects can also be restarted when an error occurs that has not yet been fully processed at the time of error detection, but whose execution has probably already begun. Thus, according to the invention, at least one operating system object is executed again when a transient or a permanent error occurs. The advantages over the Micro RoIlBack are, in particular, that the repetition of a program object can be realized with very little hardware support. At the most, additional storage space is required to store some information required for the re-execution of the program object (e.g., program object input variables). The actual administration of the method according to the invention can be carried out by the operating system of the computing device. That is, the inventive method can be realized with conventional, commercially available processors, without the need for additional hardware. Of course, it is also possible to realize the inventive method with hardware support.

The error detection itself can be done by any method. Conceivable is the use of any type of error detection mechanism, the error during the Processing of the computer program (so-called concurrent checking) can detect. For example, in a dual-core architecture, the whole core of the computer is twofold. If the computer cores are operated in a lockstep mode, it can be compared for each instruction as to whether both computer cores provide the same results. A difference in the results can then certainly conclude an error. This error detection mechanism thus detects errors in the processing of the program objects in real time. The same applies to error-detecting codes that are used throughout the processor architecture, or for duplicate subcomponents of the computing device. All of these error detection mechanisms have in common that they discover transient errors very quickly

Deliver error signal if an error has been detected.

Upon such an error signal, an error handling mechanism that repeats the program object may be initiated. If at the renewed

If the same error occurs again, a permanent error can be closed, or an error counter can be increased, whereby a permanent error is only concluded when it exceeds a certain value. If, on the other hand, the error does not occur when the program object is executed again, it can be assumed that the error was a transient error. Even during the error-free re-execution of the program object, the computer program is again ready for its intended function. The availability is thus restored after a very short time. A repetition of at least one program object is thus a good way to handle transient errors. According to an advantageous development of the invention, it is proposed that the program objects are designed as runtime objects of the computer program (referred to below as tasks) and at least one task is executed again when a fault is detected. A task is a typical OS-level object. The repetition of a task can be realized with minimal effort, if desired even purely software controlled.

According to a preferred embodiment of the invention it is proposed that a program object executed at the time of the detection of the error is restarted. Alternatively or additionally, however, program objects can also be started and executed again, which were already completely processed at the time the error was detected.

It is proposed that at least one defined state of the program objects is generated and stored during the execution of the program objects, in particular at the beginning of the execution of the program objects. This can be done, for example, by storing the values of all variables relevant to the state of the program object.

Furthermore, it is proposed that, for error detection, a redundant additional computing device be used for the computing device on which the computer program with the several program objects is executed.

Of course, more than one redundant computing device can be used for error detection.

Advantageously, the inventive method in a motor vehicle, in particular in a Motor vehicle control unit, used to ensure safe and reliable processing of the computer program despite unavoidable transient error when processing a computer program. This is especially for the execution of tax and / or

Regulatory programs in safety-critical applications in a motor vehicle of importance.

It is further proposed that a permanent error be inferred if the same error occurs again in the re-execution of the at least one program object. It is also conceivable that a permanent error is only concluded when the error still occurs after a predefinable number of repetitions of the program object. In this case, even then a transient error is concluded even if it does not disappear after a third or later repetition of the program object. By means of this development of the invention, important program objects can be repeated, for example, 3 times instead of only 2 times.

According to another advantageous embodiment of the invention, it is proposed that the number of repetitions of the at least one program object is limited to a predefinable value. This prevents the same program object from being repeated any number of times in the event of a permanent error. The restriction of the number of repetitions of the at least one program object can be done, for example, by means of a counter or via time limits. By specifying the task-dependent repeat value, it is also possible to repeat important tasks more often than less important ones, and thus to give important tasks the opportunity to run without transient errors more often or longer. while less important tasks close relatively quickly to a permanent error and initiate another system response.

According to a further preferred embodiment of the invention, it is proposed that the number of repetitions of the at least one program object be dynamically limited to a predefinable value. Advantageously, the number of repetitions of the at least one program object is dynamically limited to a predefinable value as a function of a remaining time remaining for a scheduling. In this way, for example, a first task and a second task can go through, while a third task can be repeated several times.

In order to implement the method according to the invention, it is proposed that the values of the variables necessary for executing the program object or defining the state of the program object be stored during the execution of the computer program prior to the execution of a program object. According to this embodiment, therefore, the sizes of all program objects are stored.

Alternatively, it is proposed that, in the case of a computer program which is to be periodically executed in a period, detection of an error be made by jumping back to a specific program object at a predefinable return point in the period of the computer program. Thus, according to this embodiment, if an error occurs, it will always jump to the same position within the period. Preferably, during execution of the computer program, only before the execution of a program object at the return point are the values of all variables relevant to the state of the program object saved. Thus, only the values of the relevant variables of the program object at the return point have to be stored once per cycle or period. This can save time for storage and storage space.

In a renewed execution of a program object after the detection of an error, the stored input variables are then called and made available to the program object to be executed again as input variables.

As a further embodiment of the invention, it is proposed that a plurality of return points be created for a program object. When an error occurs, not the entire program object, but only a part of the program object has to be executed again. If an error occurs, it simply jumps to the previous return point up to which the execution of the program object was error-free. For example, in the case of error-free execution of the program object up to the nth return point, an error between this and the (n + l) -th return point can be jumped back to the nth return point. The program object is then processed again from the nth return point. This saves time. Preferably, when each return point is crossed, at least one defined state is generated and stored during execution of the program object.

Of particular importance is the realization of the method according to the invention in the form of an operating system. In this case, the operating system is executable on a computing device, in particular on a microprocessor, and for the execution of the inventive Procedure programmed when it runs on the calculator. In this case, therefore, the invention is realized by the operating system, so that this operating system in the same way represents the invention as the method, the execution of which the operating system is suitable. The operating system is preferably stored on a memory element and is transmitted to the computing device for processing. In particular, an arbitrary data carrier or an electrical storage medium can be used as the storage element, for example a random access memory (RAM), a read-only memory (ROM) or a flash memory.

As a further solution of the object of the present invention, it is proposed on the basis of the computing device of the type mentioned above that the computing device has an error handling mechanism which causes a re-execution of at least one program object upon detection of an error by the error detection mechanism.

According to an advantageous development of the invention, it is proposed that the error handling mechanism has a trigger logic which, when a fault is detected, restarts the at least one program object.

According to a preferred embodiment, it is proposed that a real-time operating system, for example OSEK time, run on the computing device. Finally, it is proposed that the computing device comprises a microprocessor.

drawings Other features, applications and advantages of the invention will become apparent from the following description of exemplary embodiments of the invention, which are illustrated in the drawing. All described or illustrated features, alone or in any combination form the subject of the invention, regardless of their summary in the claims or their dependency and regardless of their formulation or representation in the description, or in the drawing. Show it:

Fig. 1 is a flowchart of an inventive

Process according to its preferred

embodiment; and

Fig. 2 shows an inventive computing device according to its preferred embodiment in a schematic representation.

Description of the embodiments

The present invention relates to a method for processing a computer program on a computing device, in particular on a microprocessor. The computer program comprises a plurality of program objects, which are preferably designed as tasks. In the method, errors are detected during the execution of the computer program on the computing device. The detected errors can be transient (ie transient) or permanent.

The transient errors can occur during the execution of a computer program on a computing device. Since the structures on the semiconductor devices (so-called chips) the computing devices are getting smaller, the clock rate of the signals but always larger and the voltages of the signals are getting lower and lower, occur during the processing of a computer program on a computing device more frequently transient errors. In contrast to permanent errors, they only appear temporarily and usually disappear after some time. In the case of transient errors, only individual bits are wrong without the computing device per se being permanently damaged. Transient faults can have different causes, such as electromagnetic influences, alpha particles or neutrons.

Due to the fact that transient errors occur almost unpredictably and are therefore not reproducible, an error treatment essentially only takes place for permanent errors in computing devices known from the prior art. A consideration of transient errors is limited to the incrementing and possibly decrementing of an error counter. This is stored in a memory and can be read off-line, that is, for example, during a workshop visit, as a diagnostic or error information. Only then can be reacted to the error accordingly. The known error handling does not allow any

Error handling within a short error tolerance time required in particular for security-relevant systems and on the other hand also no constructive error handling in the sense that within the fault tolerance time, the computer program is properly processed again and the computing device can fulfill its intended purpose.

In contrast, the method according to the invention allows treatment of a transient error of one on one Calculator running computer program with a systematic, constructive approach to the transient nature of most errors. A flowchart of the method according to the invention using the example of a runtime object, a so-called task, is shown in FIG. The existence of further tasks does not affect the basic process, so there is no need to consider it. Just as a task is handled according to the flow shown in FIG. 1, according to the invention, therefore, a plurality of tasks can also be handled. Particularly advantageous is a parallel error detection mechanism (so-called concurrent checking). This can not be represented in a flow chart, but is inserted at the corresponding position as a serial block.

The inventive method begins in a function block 1. In the function block 1 is started with the execution of the task on the computing device; the task is called. In a function block 2, a return point is generated. For this purpose, safe relevant task input variables that are sufficient to put the task into a defined state for a restart and to restart the task are stored in a memory element of the computing device. Preferably, all input variables of the task are stored. In one

Function block 3 then the task is processed further. The processing can be done either to another return point or to the end of the task. Then, an error detection mechanism is executed. The error detection can be done by any method. The errors are detected during the processing of the computer program (so-called concurrent checking). For example, in a so-called dual-core architecture, the whole core of the computer is twofold. If the computer cores in a so-called Lockstep mode can be compared for each instruction whether both computer cores provide the same results. A difference in the results can then certainly conclude an error. Such an error detection mechanism thus detects errors in the processing of the task in real time. The same applies to error-detecting codes that are used in the

Processor architecture can be used consistently or for duplicate subcomponents of the computing device. Preferably, such error detection mechanisms are used which detect transient errors very quickly and provide an error signal when an error has been detected.

In a query block 4 is checked whether an error, ie a transient or a permanent error was discovered. If an error has been detected, a branch is made to another query block 7, where the current value of an error counter logic is checked. If the error counter has not fallen below a predefinable counter reading (with a decrementing error counter) or exceeded (with an incrementing error counter), the task may have occurred during the execution of which the error has occurred, or may have a specific number of tasks that occurred before the error occurred of the error have been executed again. If it is possible to restart the execution of the task, a branch is made in a function block 8 in which the status of the error counter logic is updated with the information that another error has occurred

(decremented or incremented). From there, a branch is made into a function block 5, in which the variables stored in the function block 2 are loaded and the task for generating a defined state is supplied at the beginning of the processing. Then that becomes Function block 3 branches, where the task to be repeated partially, that is, for example, from an already processed return point, or as a whole, that is, the task is started again from the beginning, is again processed.

If it appears in the query block 4 that during the execution of the task in the function block 3 no error has occurred, is branched in a function block 9, in which the status of the error counter logic with the

Information is updated that no error has been detected. From there, a branch is made in a query block 11, where it is checked whether the computer program has been completed. If so, a branch is made to function block 6 at the end of the computer program.

Otherwise, a branch is made to a function block 12 where, according to the current task status, a further return point is generated by defining and storing safe relevant task input variables which are sufficient to restart the task. From there, it is then branched again to the function block 3, where the task to be repeated is restarted and executed again either partially or as a whole.

If it appears in the query block 7 that, due to the state of the error counter logic, a further attempt to reprocess the task is no longer possible, a branch is made to a function block 10. In the query block 7 it is checked whether the value of the error counter logic for this task is greater than a task-dependent repeat value. This task-dependent repeat value can be specified either individually for different tasks or individually for each task. In this way, it is possible, for example, for particularly important tasks to be repeated several times before a permanent one Error is reported. If the task-dependent repeat value is specified as 1, the task is repeated only once before a permanent error is detected. If the task-dependent repeat value is set to 2 or 3, the task is repeated 2 or 3 times before a permanent error is detected. In this case, the task has a longer time, or more runs, until the transient error no longer occurs. In the function block 10, a permanent error is detected and a corresponding one

Measure initiated. This measure may be, for example, to convert the computer program into a limp home or initially do nothing and then terminate the flow of the computer program.

The method according to the invention does not necessarily have to include all the function and query blocks illustrated in FIG. 1 and explained above. Thus, for example, can be dispensed with the blocks 7 to 9, which the

Affect error counter logic. Upon detection of an error, the task (s) to be restarted and executed would be repeated until the error no longer occurs. A permanent error would not be detected, so that function block 10 could be omitted. Alternatively, the task-dependent repeat value can be specified as 1, so that the function blocks 8 and 9 for updating the error counter could be omitted. Finally, it would also be possible to dispense with blocks 11 and 12 if only a single task with a single return point is executed.

FIG. 2 shows a computing device according to the invention for executing a computer program according to its preferred embodiment. The computing device is in its Entity designated by the reference numeral 20. The computing device comprises a memory element 21, which is designed, for example, as an electronic memory, in particular as a flash memory. In addition, the computing device 20 includes a microprocessor 22, on which a computer program can be processed. The computer program is stored on the electronic storage medium 21 and designated by the reference numeral 23. For processing the computer program on the microprocessor 22, the computer program is transmitted either as a whole or in sections, for example by command, via a data link 24 to the microprocessor 22. The data connection 24 can be designed as one or more data lines or as a bus system for data transmission. On the storage medium 21, an operating system is also stored that is at least partially transferred from the memory 21 to the microprocessor 22 and executed there when the computing device 20 is started up. The operating system is designated by the reference numeral 25. It has the task of controlling the processing of the computer program 23 to the microprocessor 22 and to the computing device 20 connected peripherals and manage. According to the present invention, the operating system 25 is designed in a special way so that it is programmed to carry out the method according to the invention and carries out the method according to the invention when it runs on the microprocessor 22. In particular, the operating system 25 includes access to a fault detection mechanism for detecting a

Error during the execution of the computer program 23 on the microprocessor 22. In addition, the operating system 25 includes an error handling mechanism that re-executes upon detection of an error at least one program object (a task) of the computer program 23 causes.

Claims

claims

1. A method for processing a computer program (23) on a computing device (20), in particular on a microprocessor (22), wherein the computer program (23) comprises a plurality of program objects and in the method during the execution of the computer program (23) on the computing device (23). 20) errors are detected, characterized in that at a detection of an error at least one program object that has already been fed to a processing, transferred to a defined state and started again from this.

2. The method according to claim 1, characterized in that the program objects are designed as tasks of the computer program (23) and at the detection of an error at least one task is executed again.

3. The method according to claim 1 or 2, characterized in that a executed at the time of detection of the error program object is executed again.

4. The method according to any one of claims 1 to 3, characterized in that during the processing of the

Program objects, in particular at the beginning of the execution of the Program objects, at least one defined state of the program objects is generated and stored.

5. The method according to any one of claims 1 to 4, characterized in that for error detection to the computing device (20) redundantly operating further computing device is used.

6. The method according to any one of claims 1 to 5, characterized in that the method is used in a motor vehicle, in particular in a motor vehicle control unit.

7. The method according to any one of claims 1 to 6, characterized in that a permanent error is closed (12) if the same error occurs again in the re-execution of the at least one program object.

8. The method according to claim 7, characterized in that the number of repetitions of the at least one program object is limited to a predeterminable value.

9. The method according to claim 8, characterized in that the number of repetitions of the at least one

Program object to a predefined value is dynamically limited.

10. The method according to claim 9, characterized in that the number of repetitions of the at least one program object is dynamically limited to a predefinable value as a function of a remaining time remaining for a scheduling.

11. The method according to any one of claims 1 to 10, characterized in that during the execution of the computer program (23) prior to the execution of a Program object, the values of variables necessary for processing the program object are stored (3).

12. The method according to any one of claims 1 to 10, characterized in that in a periodically processed in a period computer program (23) is jumped back in a detection of an error on a particular program object at a predetermined return point in the period of the computer program (23).

13. The method according to claim 12, characterized in that are stored during the execution of the computer program (23) only before the execution of a program object at the return point all applied to the program object input variables.

14. The method according to claim 11 or 13, characterized in that during the re-execution of a program object after the detection of an error, the program object is executed again with the input variables stored for this program object.

15. Operating system (25), which is executable on a computing device (20), in particular on a microprocessor (22), characterized in that the operating system (25) for executing a method according to one of claims 1 to 14 is programmed and the inventive method executes when it runs on the computing device (20).

16. computing device (20) for processing a

Computer program (23) comprising a plurality of program objects, wherein the computing device (20) has a

Error detection mechanism for detecting an error during execution of the computer program (23) on the computing device (20), characterized in that the computing device (20) an error handling mechanism which, upon detection of an error by the error detection mechanism, causes at least one program object, which has already been subjected to a processing, to be transferred to a defined state and restarted therefrom.

17. A computing device (20) according to claim 16, characterized in that the error handling mechanism has a trigger logic which, when a fault is detected, restarts the at least one program object.

18. computing device (20) according to claim 16 or 17, characterized in that on the computing device (20) runs a real-time operating system (25).

19. A computing device (20) according to any one of claims 16 to 18, characterized in that the computing device (20) comprises a microprocessor (22).