EP2761436A1

EP2761436A1 - Method for operating a processor

Info

Publication number: EP2761436A1
Application number: EP12702491.7A
Authority: EP
Inventors: Rene Graf; Wolfgang Hartmann
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2012-01-31
Filing date: 2012-01-31
Publication date: 2014-08-06
Also published as: CN104081344B; US20140325185A1; CN104081344A; WO2013113366A1

Abstract

The invention relates to a method for operating a processor, in which a first program (10) is provided comprising a first sequence of commands (32), at least one second program (12) is provided comprising a second sequence of commands (24), said first program (10) comprising a time-critical section (52) with time-critical commands (20), the commands (20, 24, 32) from said first and second programs (10, 12) are processed in a processor pipeline (18), a start time is identified for said time-critical section (52) in the first program (10), and a predefined interrupt program (26) is incorporated into said at least one second program (12) once the start time of the time-critical section (52) in the first program (10) has been identified.

Description

description

A method of operating a processor, the present invention relates to a method for Operator Op ben ^¬ a processor. Moreover, the present invention relates to an ^¬ processor. Finally, the present invention relates to an automation device with a processor. Modern processors today have more than one core and are referred to as multi-core processors (MC) because they can run multiple programs in real parallel. Each Pro ^¬ program runs on a separate processor core (core) and has to this with all its subunits as Fließkom- maarithmetik unit (floating point unit, FPU) shared with any other program while it is running.

Even before the first MultiCore processors, there were attempts to create a certain degree of parallelism without having to provide an additional complete processor core in addition to the already existing one. This technique is known as Hyper-Threading (HT). Here, there is a processor core Einhei ^¬ th, the process different tasks. Examples are z. For example, the Load Storage Unit (LSU), which exchanges data between processor registers and memory, or the Arithmetic Logical Unit (ALU), which handles integer calculations.

These units can already partially work in parallel even in a processor with a processor core, as long as the data is available. Thus, the ALU may operate on values in certain registers while the LSU is loading or transferring other registers. Internally, the processor has a so-called pipeline in which the individual instructions are processed in succession, whereby the individual stages of the pipeline map the various units in the processor. Most of the time, however, the units have to wait for each other, so the pipeline can only be partially filled and the theoretically possible computing power is not exploited. This limitation is largely removed by the HT processors by playing two or more processors to the operating system, even though they only have parts of the additional processors available internally.

For example, two quasi-parallel programs run on a processor with a per ^¬ zessorkern (single-core processor). Each program contains a list of instructions (instruction queue), which must be processed one after the other. In a single-core processor, the order is unique and the runtime of the command chain is always the same even after any number of passes. The programs run deterministically in this case. In a HT-processor, however, the processor internally mixes the commands from the two available command lists to fill the pipeline as optimally as possible and to reach the theoretically possible overall ^¬ computing power. This works quite well for many programs, and due to the fact that not all units are redundant, does not reach the double or n-times the performance ^¬ the can. Especially complex units like the FPU are usually only available once for HT processors. If two programs are running in parallel, both of which use floating-point arithmetic, the HT processor is no faster than a normal single-core processor. However, since this is rarely the case, an HT processor basically enables an increase in computing power.

However, one drawback is that the runtime of one program is no longer predictable, as it depends to a significant degree on the instructions of the second program. In vie ^¬ len runs therefore many different terms will result. The determinism plays no role in standard operating systems (GPOS), since the programs do not have to do their job of a certain time here. In the field of real-time Systems in which also real-time operating system (Real-Time Operating System, RTOS) are used which vorhersag ^¬ bare term but is an essential part of the application. The above circumstance has prevented the use of HT processors for real-time systems. In many configurations, in addition to the real-time application, there is a non-real-time application, such as: B. a user interface. The HT processors offer a cost-effective alternative to true multi-core processors to get more computing power, at least for non-real-time applications, but can not be used because of the massive time constraints on the runtime of the real-time application. Even true MultiCore computers still have certain limitations, as well as the processor, in which all units are multiple, other components such as memory or peripherals are used, which are only once in the entire system and thus lead to mutual blockage of each program can.

Therefore, it is an object of the present invention to provide a way to increase the performance and reliability of a processor.

This object is achieved by a method for operating a processor according to claim 1. In the same way, this object is achieved by a processor according to claim 7 and by an automation device according to claim 10. Advantageous developments of the present invention are specified in the subclaims.

The method according to the invention for operating a processor comprises providing a first program with a first sequence of commands, providing at least one second program with a second sequence of commands, wherein the first program comprises a time-critical section with time-critical commands, processing the commands the first and second program in a processor pipeline, the detection of a start time of a time-critical Ab ^{¬ section} in the first program and the insertion of a predetermined interrupt program in the at least one second program, as soon as the start time of the time-critical section is detected in the first program ,

The term command is to be understood in particular as a machine command of the processor. The commands can be assigned to the respective units of the processor. The term portion denotes in particular a plurality of hanging together ^¬ commands. In particular, a program is understood to be a list of instructions which is executed by the processor in order to achieve a specific functionality. The commands can be assigned to corresponding sections.

In the present case, at least two programs are provided on a processor, each comprising a sequence of commands. These programs will be distinguished ^¬ as instruction queues. The processor comprises at least two logical Ker ^¬ ne, wherein each of the logical cores is associated with a program. Thus, the processor is designed as a hyper-threading processor (HT processor). Herein may be a time-critical section or time-critical commands or command sequences umfas ^¬ sen the first program on the processor. The first program may therefore include non-time critical sections and time-critical sections. The processor or an operating system executing on a storage device of the processor is adapted to recognize ei ^¬ NEN start time of the time-critical process in the first program. Once the start time of the time-critical process in the first program is detected, a predetermined interrupt program is inserted into the at least one second program. In other words, the section currently provided or processed on the at least second program is replaced by a corresponding one Interrupt program interrupted or stopped. This interruption program may include a sequence of commands previously known. Thus, the interruption program is exactly predictable. The interrupt program preferably knows exactly which commands are executed and which parts of the processor are accessed. Thus, a processor on which time-critical commands or sections can be processed can also be made accessible for non-time-critical parts of a system without adversely affecting the time-critical processes. The essential aspect of why the mixing of the sections or commands in a conventional HT processor leads to a non-deterministic execution of a command list is the unpredictability of which command is in the other program. This problem is inventively solved in that the commands or ^{Abschnit ¬} te the other programs are defined, which may be processed in the case of a time-critical process sequence yet. Preferably, with the start of the time-critical portion in the first program, an interrupt signal is sent to the second program for inserting the interrupt program. The time-critical section or the time-critical sequence can generally be started by an interrupt, since the time-critical processor must react to an event. Such an event can occur either cyclically, for example controlled by a timer. Likewise, such an event may occur sporadically as a result of an alarm. The start of the time-critical section or the time-critical sequence in the first program now transmits an interrupt signal or an interrupt to the at least second program of the processor, wherein the second program is seen from the point of view of the operating ^¬ system as an independent and complete core. The inset in the second program interruption program that can also be used as interrupt service routine (ISR) ^¬ be distinguished, has launched the second processor core in a defined state. Thus, in a processor having at least two logical cores, the at least second logical Core are brought in a defined state, when the first logical core a time-critical section is detected. In a preferred guide of the time-critical section is form-machining ^¬ tet along with the interruption program in a predictable sequence in the processor pipeline. If a time-critical section is detected in the first program, a previously defined interrupt program is inserted in the at least second program. Since this interrupt program, which may also be referred to as an idle task, consists of known instructions, mixing with the unknown instructions of the time-critical section in the processor pipeline nevertheless becomes clear. As a result, the same run times always result, especially with several runs of the time-critical program.

The decommissioning of the second or more logical KER ne or programs using the interrupt program allowed ^¬ light is to keep the time-critical section the necessary determinism. In addition, with the described method in the processing of non-time-critical programs, a higher throughput compared to a single-processor solution can be achieved because the non-time-critical programs can use more than one logical processor core, if no time-critical section is currently being processed. If the time-critical section is not active, the non-time-critical programs can use the at least two logical cores of the HT processor. The non-deterministic Vermi ^¬ research is irrelevant here because the non-time-critical pro ^¬ programs are valued at their data throughput rising by the parallel processing. Thus, the performance of the entire processor or device can be increased without affecting the reliability.

Preference is terminated with the time-critical section and the Un ^¬ terbrechungsprogramm. Thus, the non-time critical Program interrupted or shut down by the interrupt program only for such a period of time that requires the time-critical section for editing. Following this, a non-time critical program can be processed on both logical processor cores again. Thus, the processing of non time-critical programs or commands can be accelerated on the processor.

In one embodiment, the interrupt program comprises reading a value from a memory, comparing the read value with a predetermined value, and restarting the interrupted program, if the ge ^¬ read value and the predetermined value are different. In the interrupt program or the Idletask, a value or a date is first read from a previously defined memory cell. It then compares whether this value matches a predefined value. If it is not the case, the value in the memory is read again and thus the routine is restarted. These commands also define which units of the processor will be used. In this case, in particular those units are used which are multiply present in the processor or which belong to the second program or the second logical core alone. Such units may be, for example, a load storage unit that can load corresponding values from a memory. In addition, a so-called compare unit can be used, which can compare corresponding values. As SpeI ^¬ cherzelle, from which the value is read, an internal memory cell of the processor can be used in particular. Thus, that is accessed during the interrup ^¬ monitoring program to the physical main memory can be prevented. Thus, the units of the processor be ^¬ known to be used by the interrupt routine. The other units are fully available to the program or logical core that processes the time-critical section. As a result, the runtime of the time-critical section or command list is not adversely affected. In one embodiment, the interrupt program is terminated by writing to memory a value that corresponds to the predetermined value. During the interrupt routine, a value can be read from an internal memory of the processor and compared to a predetermined value. As long as the time-critical section is not completed or processed, the previously set value is not written to memory. Only after the processing of the time-critical section is in the

Memory the previously set value written. Thus, the routine of the interruption program is also ended. It is advantageous to select a memory cell located within ^¬ half of the processor and not in physical main memory, so that the interrupt program and the idle task can occupy no singular resources such as memory access, or even bus systems. Thus, the interrupt program can be terminated in a simple manner and the non-time critical commands can be processed accordingly on the at least two programs or logical cores.

The processor according to the invention comprises a first processor ^¬ unit for standing by a first program with a first sequence of instructions, at least one second processor wherein the first Pro ^¬ program includes unit for providing at least a second program with a second sequence of commands, a time-critical section a processor pipeline for processing the commands from the first and two ^¬ th program and a memory device with an operating system, wherein the processor is adapted to perform the loading ^¬ operating system and wherein the processor or the operating system is adapted to to recognize start time of the time-critical section in the first program and a predetermined interrupt program in the to-least a second program to insert as soon as the start ^¬ date on which the time-critical section in the first Pro ^¬ program is detected. Preferably, the processor includes a data exchange unit and a comparison unit, wherein said data exchange unit is adapted to read ei ^¬ NEN value from a memory during the interruption program, and wherein the comparison unit is configured to, during the interrup ^¬ monitoring program the read value with a previously compare festgeleg ^¬ th value. Thus it can be ensured that only certain ^¬ A units of the processor are used during the interruption program. Such units are preferably used which are available to the respective logical processor core or program alone. Likewise, preferably, an internal memory of the processor can be used. In a further embodiment, the processor comprises at least two processor cores. The method described above for operating a processor can be used not only for HT processors but also for true multi-core processors in which all cores have all the necessary units. In the case of common MultiCore processors, even the strict division of time-critical and time-uncritical programs assigned to the respective processor cores can lead to non-deterministic behavior, since both subapplications access other system components such as memory or peripherals. Especially the latter causes problems, even if the different applications use different peripherals, since the connection of processor with multiple cores to the multiple peripherals on individual bus systems (usually PCI or PCIe) is built. The resulting possible mutual blockages can be excluded by the method described.

Another advantage is the deterministic cache behavior also enabled by the method. Both in many MultiCore and HyperThreading

Processors share several cores the cache. If one of the programs or commands in a kernel can not access the cache but must access the main memory, slows down the entire execution time considerably. If a defined interrupt program is run when executing a time-critical section on the at least one further program or core, the real-time application can not be influenced by its cache behavior.

The automation ^{device according} to the invention comprises a processor described above. Automation devices usually include two essential functions, the physical control of a physical process and the

Communication with the outside world. The communication can be either through a user interface or via a

Network connection to an external HMI device. The control of the physical processes usually requires compliance with defined time conditions, while in the communication already by the inertia of the user corresponding waiting times must be provided. The

Control function, which is usually a time-critical section, usually takes up only a small part of the processing power of the processor. The

Control tasks should be able to be executed at any time. The communication programs take

usually a significantly larger proportion of

Computing power for the visualization of states or the data exchange with other devices to complete. By using the processor according to the invention can the

Communication programs are provided significantly more computing power. In addition, costs can be saved by the processor.

The advantages and developments mentioned above in connection with the method according to the invention can be transmitted in the same way to the processor according to the invention and the automation device according to the invention.

The present invention will now be explained in more detail with reference to the accompanying drawings. Showing: 1 shows a schematic representation of the program sequences in a HyperThreading processor;

2 shows a schematic representation of the command arrangement of two programs, wherein the first program comprises a time-critical section;

3 shows a schematic representation of the instruction arrangement of two programs and a processor pipeline;

4 shows a schematic representation of a test arrangement;

and

5 shows a schematic representation of another test arrangement.

Below detailed embodiments illustrate preferred embodiments of the present invention.

1 shows a schematic representation of the execution of the programs on a processor according to the prior art. Such a processor is called a HyperThreading processor. The processor comprises a processor core, on which, for example two processor units are vorgese ^¬ hen. Each processor unit is assigned to a program. In the present case a first Pro ^¬ program 10 and a second program 12 is run on the processor. The first program 10 comprises a first sequence of instructions 14 and the second program 12 comprises the second series of instructions 16.

The order of the instructions 14 in the first program 10 and the instructions 16 and the second program 12 are clearly ^¬ ordered. In the first program 10 and in the second program 12 time-critical and non-time critical sections can be processed. The instructions 14 from the first program 10 and the instructions 16 from the second program 12 are processed in the processor pipeline 18. In the processor pipeline 18, the commands 14 and 16 are sorted accordingly and worked from ^¬ . This results in the processor pipeline ei ^¬ ne non-deterministic and disordered order of the first instructions 14 and the second commands 16th

The first program 10 and the second program 12 can be considered as a logical processor cores, wherein the Prozes ^¬ sor only includes a processor core. Depending on the processor manufacturer, different units of the processor may be present once or several times. Usually, the units performing simple arithmetic operations are multiple, with more complex units being simply present. FIG. 2 shows a schematic representation of two programs 10 and 12, the first program 10 comprising a time-critical section 52. Such a time-critical section 52 is usually started by an interrupt, since this must respond to an event. Such an event may, for example, occur cyclically or only sporadically. Of the

Start time of the time-critical portion 52 is indicated by the arrow 22. The second program 12 includes a non-time-critical portion 52. The processor or an operating system executed on a memory device of the processor is configured to recognize the start time of the time-critical portion 52 in the first program. Once the start time of the time critical portion 52 is recognized in the first program 10, a predetermined interrupt program 26 is inserted in the second program 12.

With the start of the time-critical section 52 a ent ^¬ speaking interrupt signal or an interrupt to the second program 12 is sent. As a result, the process performed on the second program 12 uncritical From ^¬ section 54 is interrupted and an interrupt program is inserted 26 in the non-critical section 54th Of the Interrupt which is transmitted from the first program 10 in the second ^¬ Pro program 12 is characterized in FIG 2 by the arrow 28th The interruption program, which may also be referred to as an idletask, may include the following steps:

Reading a value from a memory,

- Compare the read value with a previously defined and

- restarting the interruption program if the read value and the specified value differ.

This command sequence also defines which units of the processor will be used. In the present case, only the load storage unit and the compare unit are used. In addition, it is preferred resorting to an internal memory of the processor to ^¬. With the end of the time-critical section 52 is also the

Interruption program 26 ended. When the time-critical portion 52 is completed, a write command writes the previously determined value into the memory cell, which is permanently read by the interrupt program 26. This is exemplified in FIG 2 by the arrow 30 Darge ^¬ represents.

FIG. 3 shows a schematic representation of the processing of the commands in the first program 10, in the second program 12 and in the processor pipeline 18. The first program 10 comprises time-critical sections 56 or commands 32 as well as time-critical sections 52 or commands 20. If it is detected in the first program 10, the start time of a zeitkriti ^¬ rule section 20, as in the second program 12 is a previously defined program interruption 26 is inserted. In this case, the time-critical section 24 currently being operated on the second program is interrupted or shut down. The sections or instructions 20, 24, 26, 32 from the programs 10 and 12 are processed in the processor pipeline 18.

Before a time-critical portion 52 has been detected, the instructions 24, 32 of the first program 10 and the second

Program 12 in an unpredictable order abgearbei ^¬ tet and this command sequence is illustrated in the region 38 in the process pipeline 18th As soon as the starting point of a time-critical section 52 is detected, the interrupt program 26 is inserted into the second program. The interrupt program 26 includes the previously defined steps. By jointly processing the time-critical portion 52 of the first program and the interrupt ^¬ program 26 from the second program 12 results in a de- terministic and predictable order of the instructions 20, 26. This is Darge ^¬ in the field 36 in the processor pipeline provides. After the time-critical portion 52 is finished, the interrupt program 26 is also ended. Following this, the non-time-critical sections 54, 56 from the first program 10 and the second program 12 are processed. This is illustrated by region 34 in the processor pipeline 18.

4 shows the schematic representation of a test arrangement for the quantitative evaluation of the method according to the invention. In this case, a first program comprises a time-critical section 52. A second program 12 comprises a non-time-critical or a non-time-critical section 54. In a first test condition, the first program 10 and the second program 12 are each assigned a logical core of a hyper-threading process be. In a further test condition, the first program 10 and the second program 12 may each be assigned a processor core of a multi-core processor (MultiCore processor).

In this test scenario, a floating point unit (FPU) is used for calculations. Such a floating-point unit is used in an HT processor with two logarithms. see cores only once available. The following test scenario is implemented below: The non-time-critical section 54 carries out permanent calculations which the FPU require. The time critical portion 52 is triggered accordingly and also performs calculations on the FPU. Here, the time-critical section 52 measures the time for the calculation ^¬ calculations on the FPU.

The sections 52, 54 are thereby ver ^¬ divides targeted to the nuclei of a rule two logi ^¬ HT-MC processor and a processor, so that they fill the parallel programs 10, 12th The following table shows the measurement values for the runtime of the time-critical section 52 in different test cases where no measures were taken in the operating system or the operating system. At each measurement were

60,000 runs made.

The times in line 2 show very clearly the effect in the HT processor that the execution times on the logical core of the processor, on which the time-critical Ab ^{¬ section} 52 runs significantly increased, if on the other core also an FPU application running. In the worst case, even twice the time is needed, which is the case when both logical cores of the HT processor rely on the FPU in parallel. On the other hand, on a true multi-core processor, the two sections 52, 54 on the two logical cores do not influence each other at all, which was also to be ^expected .

The following table shows inventive procedural ^¬ ren is implemented, in which the second core or logical core is sent an interrupt program or IdleTask per interrupt, so it should not be collisions.

As the measured values clearly show, the times in this case remain very constant. They are slightly higher than the above values due to the additional mechanism in which an interrupt program 26 is additionally processed. In the first line, even the worst times are better than in the table above, since any influence by programs on another logical core is prevented.

In summary, it can be stated that the method ^according to the invention functions and contributes significantly to improving the deterministic of the time-critical sections. In the times when the time critical section 52 is not running, the previous power is available for the other cores.

Another important aspect in addition to the mutual interference of the logical cores of an HT processor are the accesses to the peripherals, since these can severely disrupt the time-critical sections or applications since collisions on a bus system also lead to unpredictable delays being able to lead. In another test scenario, a PCI card was provided in a computer which cyclically generates interrupts at intervals of one millisecond. The corresponding interrupt service routine (ISR) performs various time measurements and sends a signal to the application, which is continued with it. The ISR makes a write command to the PC card.

The test scenario is shown schematically in FIG. In this case, a multi-core processor comprised four cores 40, 42, 44 and 46, the time-critical program 10 including the ISR run in the first core 40 of the multi-core processor, while on the other ^¬ three core 42, 44, 46 each have a corresponding Appli ^¬ cation permanently read via a PCI bus 48 accesses a PCI card 50. The following table shows the measured values that occur when the system is running as described. A total of 60,000 cycles were run to get statistical information. Minimum Time Average Time Maximum Time (ns) (ns) (ns)

ISR latency 6960 9210 10580

PCI read command 1040 2007 2147

PCI write 12 14 18

command

Duration ISR ins3910 6010 6533

total

By using the method according to the invention, the measured values are given in the following table. Here also 60,000 cycles were gone through.

ISR latency times improve on average by 30% due to a reduced bus load. The PCI read accesses become significantly more stable and are subject to fluctuations by the method according to the invention, while previously the worst times were about 100% above the best. As expected, the PCI write accesses do not extend at all, since they are always written in one buffer only. However, the time that this value takes until it has arrived in the register of the PCI card should also be significantly improved or stabilized, but this has not been proven. The total duration of the ISR also declined by more than 30%. _lg

Reference sign list

10 program

12 program

14 command

16 command

18 processor pipeline

20 command

22 arrow

24 section

26 interruption process

28 arrow

30 arrow

32 section

34 area

36 area

38 area

40 processor cores

42 processor core

44 processor core

46 processor core

48 PCI bus

50 PCI card

52 section

54 section

56 section

Claims

claims

1. A method for operating a processor by

Providing a first program (10) with a first sequence of instructions (32),

Providing at least one second program (12) with a second sequence of instructions (24), wherein

the first program (10) comprises a time-critical section (52) with time-critical commands (20), and

- editing the instructions (20, 24, 32) from the first and second programs (10, 12) in a processor pipeline (18), characterized by

- detecting a start time of the time-critical from ^¬-section (52) in the first program (10), and

- Inserting a predetermined interrupt program (26) in the at least one second program (12) as soon as the start time of the time-critical section (52) in the ers ^¬ th program (10) is detected.

2. The method according to claim 1, characterized in that with the start of the time-critical section (52) in the first program an interrupt signal to the second program (12) for inserting the interrupt program (26) is sent.

3. The method according to claim 1 or 2, characterized in that the time-critical section (52) is processed together with the bottom refraction ^¬ program (26) in a predictable sequence in the processor pipeline (18).

4. The method according to any one of the preceding claims, characterized in that with the time-critical portion (52) and the interrupt program (26) is terminated.

A method according to any one of the preceding claims, wherein the interruption program (26) comprises the following steps:

Reading a value from a memory, - comparing the read value with a predetermined value, and

- restarting the interrupt program (26) if the read value and the specified value differ.

A method according to claim 4 or 5, wherein the interrupt program (26) is terminated by writing to the memory a value corresponding to the predetermined value.

7. Processor with

a first processor unit for providing a first program (10) with a first sequence of instructions (32), at least one second processor unit for providing at least one second program (12) with a second ^sequence of instructions (24)

the first program comprises a time-critical section (52) with time-critical commands (20),

a processor pipeline (18) for processing the instructions (20, 24, 32) from the first and second programs (10, 12), and

a memory device with an operating system, wherein the processor is designed to execute the operating system,

characterized in that

- The processor or the operating system is adapted to detect a start time of the time-critical portion (52) in the first program (10) and insert a previously defined interrupt program (26) in the at least one second program (12) as soon as the Start time of the time-critical portion (52) in the first program (10) is detected.

8. Processor according to claim 7, characterized in that the processor comprises a data exchange unit and a comparison ^¬ unit, wherein the data exchange unit is to ^¬ forms, during the interruption program (26) a To read value from a memory and wherein the Vergleichsein unit is adapted to compare the read value during the interrupt program ^¬ program (26) with a predetermined value.

9. Processor according to claim 7 or 8, characterized in that the processor comprises at least two processor cores.

10. Automation device with a processor according to one of the preceding claims.