EP1790132A1

EP1790132A1 - Distributed communication system using two communication controllers as well as method for operating such communication system

Info

Publication number: EP1790132A1
Application number: EP05774640A
Authority: EP
Inventors: Jörn Ungermann
Original assignee: Philips Intellectual Property and Standards GmbH; Koninklijke Philips Electronics NV
Current assignee: NXP BV
Priority date: 2004-09-02
Filing date: 2005-08-17
Publication date: 2007-05-30
Also published as: JP2008512021A; WO2006024982A1; CN101053216A

Abstract

In order to provide a method for operating a distributed communication system, in particular for synchronizing clocks in such distributed communication system, with a number of nodes (100; 100'; 100') being interconnected by at least one communication link comprising at least two channels (10, 20), wherein it is possible to maintain access of the application host (60) to at least one of the channels (10, 20) attached to the communication controller (30, 32) in case this communication controller (30, 32) fails or breaks down, it is proposed that each channel (10, 20) is controlled by its own communication controller (30, 32).

Description

DISTRIBUTED COMMUNICATION SYSTEM USING TWO COMMUNICATION CONTROLLERS AS WELL AS METHOD FOR OPERATING SUCH COMMUNICATION SYSTEM

5 The present invention relates in general to the architecture for fault- tolerant time-triggered communication systems.

The present invention in particular relates to a method for operating a distributed communication system, in particular for synchronizing clocks in such distributed communication system, with a number of nodes being interconnected by at

10 least one communication link comprising at least two channels.

The present invention further relates to a node of a distributed communication system with a number of nodes being interconnected by at least one communication link comprising at least two channels.

In Figure 1, a typical fault-tolerant time-triggered network is shown by

15 way of a schematic. This network comprises two channels Cl, C2 to which respective nodes N are connected. Each of these nodes N comprises bus drivers Bl, B2, a communication controller CC with a protocol engine P and a controller host interface CI, optionally a bus guardian device for each bus driver Bl, B2 and the application host H.

20 The bus driver B 1 , B2 transmits the bits and bytes, which are provided by the communication controller CC, onto its connected channel Cl, C2 and in turn provides the communication controller CC with the information received on the channel Cl, C2. The communication controller CC is connected to both channels Cl, C2 via its bus drivers Bl, B2, delivers relevant data to the host application H, and receives data

25 from the host application H which in turn assembles the data to frames and delivers the data to the bus drivers Bl, B2.

The bus driver Bl, B2, the (optional) bus guardian and the host device H are at least partially time-triggered, meaning that the time is sliced into recurring cycles where each cycle comprises several segments. Each node N determines the start of a

30 new cycle according to its own built-in clock. At least one segment is divided into a fixed number of slots where each slot is assigned up to at most one communication controller CC wherein that communication controller and alone that communication controller CC has the right to transmit. Other segments of the cycle can be used for dynamic arbitration schemes or for other purposes.

The bus guardian is a device with an independent set of configuration data that enables the transmission on the bus only during those slots which are specified by the configuration set.

The host application H contains the data source as well as data sink and is generally not concerned with the protocol activity. Only decisions that the communication controller CC cannot do alone are made by the host application H.

The nodes N have to be synchronized to each other because each node N derives on its own the start of the cycle and thereby the placement of all segments and slots in time. Each node N has an own clock in order to not be dependent on a single master clock whose failure would collapse the whole system. The difference between its own clock and the clocks of some subset of nodes of the system, which are called synchronization nodes (or sync nodes), is used to correct its own clock in a fault- tolerant way.

Two kinds of clock correction are possible, - pure offset correction and offset correction and rate correction combined. The offset correction only corrects the clock offset whereas the rate correction also tries to align the different rates of the clocks in the system, thereby keeping the clocks closer to each other (reducing the amount of offset correction necessary, thereby increasing the available bandwidth due to decreased inter transmission gaps). Typically the clocks are corrected at the end of a cycle, or at the end of two cycles in case of rate and offset correction because two measurement values are necessary to calculate a rate deviation.

All sync(hronization) nodes have to transmit synchronization frames during one of their assigned slots on both channels Cl, C2 at the same time; so all nodes receive the same time information, also nodes connected to only one channel. A system matching the above description is disclosed in in Herman Kopetz et al., "Specification of the TTP/C protocol" (draft 0.5), TTTech Computertechnik AG, July 1999 (cf. http://www.tttech.com/) or in R. Belschner et al., "FlexRay - Requirements Specification" (version 2.0.2), FlexRay Consortium, April 2002 (cf. http://www.flexray.com/); the corresponding prior art document WO 03/010611 Al relates to the FlexRay protocol and reveals a clock synchronization in distributed systems, in particular for a FlexRay (RTM) automotive communication system having node subset, message reception, time and clock rate deviation measurement, offset correction value calculation as well as clock rate correction value calculation and node clock adjustment.

Prior art document EP 1 355 459 A2 refers to a method for synchronizing clocks in a distributed communication system comprising at least one communication media and a number of nodes connected to the communication media; the nodes comprise the clocks. In order to provide a possibility of synchronizing the clocks of the nodes with a high precision and at the same time tolerating high deviations in clock rate, it is suggested that for synchronizing the clocks differences in offset of the clocks as well as differences in rate of the clocks are corrected.

Apart from that, prior art document JP 2003-195903 discloses a duplicated communication module device; however, this prior art document does not refer to the fault-tolerant distributed synchronization of a duplicated system.

Despite all efforts as described above, the problem remains that if a dual- channel communication controller goes faulty access to both of its attached channels is prevented thereby cutting the application host of the communication controller off the communication. By having only a single communication controller handle both channels, several single-point-of-failures arise, for example the clock synchronization is only implemented and used once. Starting from the disadvantages and shortcomings as described above and taking the prior art as discussed into account, an object of the present invention is to maintain access of the application host to at least one of the channels attached to the communication controller in case this communication controller fails or breaks down.

The object of the present invention is achieved by a method comprising the features of claim 1 as well as by a node comprising the features of claim 5.

Advantageous embodiments and expedient improvements of the present invention are disclosed in the respective dependent claims.

The gist of the present invention refers to the concept of combining a dual-channel clock synchronization with a single-channel based architecture for fault- tolerant time-triggered communication systems. In this context, the communication controllers of each channel preferably employ some kind of fault-tolerant clock correction mechanism among each other applying correction to both the offset and the rate.

The single-channel architecture according to the present invention, i. e. the approach to separate the communication controller into two independent entities has the advantage that each of the entities (= so-called single-channel communication controllers) is able to continue working if the other half fails; in other words: the at least two single-channel communication controllers communicate with each other in a way that the failure of one single-channel communication controller cannot prevent the other single-channel communication controller from further communicating.

In addition, it will be appreciated by the artisan that in order to enable a communication node to interact with the other - possibly conventional - communication controllers, no radically differing behaviour may be shown on the channels (while the interface to the application may be different). Such solution can be used - to provide two different chips, one communication controller for each channel or to integrate the two single-channel communication controllers into a single chip to reduce costs for the communication systems, which require less fault tolerance. According to a preferred embodiment of the present invention, two different mechanisms of the single-channel communication controllers have to communicate, the startup and the clock synchronization. By dividing the single-channel communication controller into two (nearly) independent entities, two different fault-domains are provided; so if one of the new communication controllers has a fault, the other one continues to work. But this mechanism can only work together with the conventional dual-channel communication controllers if the combined behaviour of the two single-channel communication controllers does not deviate too much from the behaviour of a dual-channel communication controller. Especially the clocks of both communication controllers should be closely aligned (, which is given for a dual-channel communication controller because a dual-channel communication controller comprises only one clock anyway). The mechanism proposed by the present invention keeps already closely aligned channels closely aligned while it not necessarily synchronizes channels that deviate from each other by for instance half a cycle length or more. In combination therewith or independently thereof, the communication controllers of each channel preferably employ some kind of fault-tolerant clock correction mechanism among each other.

Consequently, the present invention realizes the concept of keeping the channels "as synchronous as necessary", i. e. each single-channel communication controller closely follows the timing difference between itself and its attached single- channel communication controller of the opposite channel. A first option of achieving this is via at least one dedicated interface by which the two single-channel communication controllers mutually measure the time of the respectively other single-channel communication controller in relation to the respectively own time. This can be advantageously employed via at least one dedicated signal line which signals the local cycle start to the other single channel communication controller.

Depending on the clock synchronization algorithm applied, a signal just before the correction phase might provide better results but also other points in time are possible. It is only necessary that the attached single-channel communication controller knows when it has to expect the signal. Then the attached single-channel communication controller can calculate the clock offset from the difference between the expected signal and the actual signal. Via this signal, the offset as well as the rate difference between the two single-channel communication controllers can be calculated.

Another option is to directly exchange a numerical value incorporating more information regarding the local clock and the channel.

The two associated single-channel communication controllers now know the clock difference they have to each other. AU single-channel communication controllers know the clock difference to their local counter-part. They all calculate in an identical way an additional, signed inter-channel correction.

The present invention further relates to a computer program being able to run on at least one computer, in particular on at least one microprocessor, and being programmed in order to execute a method as described above.

According to a preferred embodiment of the present invention, the computer program can be stored on at least one R[ead]O[nly]M[emory], on at leastone R[andom]A[ccess]M[emory] or on at least one flash memory.

The present invention further relates to a distributed communication system with a number of nodes as described above, wherein said communication system is fault-tolerant and/or time-triggered.

The present invention finally relates to the use of the method as described above and/or of at least one computer program as described above and/or of at least one node as described above and/or of the communication system as described above for synchronizing clocks in a at least dual-channel environment wherein differences in offset of the clocks as well as differences in rate of the clocks can be corrected All in all, the presented mechanisms enable a scalable architecture concept based on single-channel communication units. Thereby, this concept allows building system architectures with different levels of fault tolerance. Furthermore, it provides all freedom for product decisions.

The same functional unit can be implemented as a single-channel I[ntegrated]C[ircuit] or, without any functional changes, can be combined to a redundant dual-channel I[ntegrated]C[ircuit]. The concept according to the present invention even supports product options that allow for using the two communication controllers of a chip to participate in different communication clusters. For such application, the inter-channel interface is simply disabled. Each communication unit is fully functional to operate as a single unit in a cluster on its own.

As already discussed above, there are several options to embody as well as to improve the teaching of the present invention in an advantageous manner. To this aim, reference is made to the claims respectively dependent on claim 1, on claim 3 and on claim 5; further improvements, features and advantages of the present invention are explained below in more detail with reference to three preferred embodiments by way of example and to the accompanying drawings (cf. Fig. 2A to Fig. 3B) where

Fig 1 schematically shows a network system according to the prior art; Fig. 2A schematically shows a first embodiment of a fault-tolerant time- triggered network system according to the present invention which works according to the method of the present invention;

Fig. 2B schematically shows a second embodiment of a fault-tolerant time-triggered network system according to the present invention which works according to the method of the present invention; Fig. 2C schematically shows a third embodiment of a fault-tolerant time- triggered network system according to the present invention which works according to the method of the present invention; Fig. 3 A schematically shows a diagram of the clock information exchange between the two single-channel comunication controllers according to the present invention where the measurement of their offset as well as the change of the cycle length are illustrated as a function of the time t; and

Fig. 3B schematically shows a diagram of the clock information exchange between the two single-channel comunication controllers according to the present invention where the measurement of their offset and of their rate differences as well as the change of the cycle length are illustrated as a function of the time t. The same reference numerals are used for corresponding parts in Fig. 2A to Fig. 3B.

In conventional architectures (cf. Figure 1), to save costs only a single- communication controller CC is assigned for each node N to handle two channels Cl, C2 that are needed for redundancy reasons. Nevertheless, this approach is error-prone to the extent that a single error in the communication controller CC disables the bus access of this node N to both channels Cl, C2.

Whereas earlier communication controllers CC according to the prior art had a single clock synchronization section and hence were not fault-tolerant (cf. Figure 1), the present invention describes a distributed communication system as well as a method for having independent clock synchronization and clock correction for each ^■ communication controller 30, 32 (cf. Figure 2A, Figure 2B and Figure 2C, where the respective schematic - of a first embodiment of a node 100, of a second embodiment of a node 100', and of a third embodiment of a node 100" is shown).

This difference between the dual-channel based architecture according to the prior art (cf. Figure 1) and the single-channel based architecture according to the invention (cf. Figure 2A, Figure 2B and Figure 2C) is implemented in that the protocol engine 50, 52 of the communication controller 30, 32 serving the different channels 10, 20 is essentially doubled and can thereby be built into independent devices.

In this context, the fault-tolerant time-triggered system according to the invention comprises two channels 10, 20 to which respective nodes are connected. Each of these nodes comprises a respective bus driver 12, 22, a respective communication controller 30, 32 with the respective protocol engine 50, 52 and a respective controller host interface 40, 42, optionally a respective bus guardian device for each bus driver 12, 22 and the application host 60. The respective bus driver 12, 22 transmits the bits and bytes, which are provided by the respective communication controller 30, 32, onto its respective connected channel 10, 20 and in turn provides the respective communication controller 30, 32 with the respective information received on the respective channel 10, 20.

The respective communication controller 30, 32 is connected to the respective channel 10, 20 via its respective bus driver 12, 22, delivers relevant data to the host application 60, and receives data from the host application 60 which in turn assembles the data to frames and delivers the data to the respective bus driver 12, 22.

The respective bus driver 12, 22, the (optional) bus guardian and the host device 60 are at least partially time-triggered, meaning that the time is sliced into recurring cycles where each cycle comprises several segments. Each node determines the start of a new cycle according to its own built-in clock. At least one segment is divided into a fixed number of slots where each slot is assigned up to at most one respective communication controller 30, 32 wherein that respective communication controller 30, 32 and alone that respective communication controller 30, 32 has the right to transmit. Other segments of the cycle can be used for dynamic arbitration schemes or for other purposes. The bus guardian is a device with an independent set of configuration data that enables the transmission on the bus only during those slots which are specified by the configuration set.

The host application 60 contains the data source as well as data sink and is generally not concerned with the protocol activity. Only decisions that the respective communication controller 30, 32 cannot do alone are made by the host application 60.

With respect to the present invention as illustrated in Figures 2A to 3B, it will be appreciated by the artisan that since even the dual-channel based architecture according to the prior art (cf. Figure 1) has to cope with two quasi independent channels 10, 20, the additionally effort in logic for the single-channel based architecture of Figures 2 A, 2B, 2C is nearly negligible, i. e. only a handful of mechanism has to be doubled.

More specifically, the redundant communication channel can be based on the single-channel architecture using two separated instances 30, 32 (cf. Figure 2B) or an on-chip implementation within a single unit (cf. Figure 2C). Correspondingly, the local intra-channel communication interface is a chip-external interface 54 (cf. Figure 2B) or an on-chip interface 56 (cf. Figure 2C), respectively.

The nodes have to be synchronized to each other because each node derives on its own the start of the cycle and thereby the placement of all segments and slots in time. Each node has an own clock in order to not be dependent on a single master clock whose failure would collapse the whole system. The difference between its own clock and the clocks of some subset of nodes of the system, which are called synchronization nodes (or sync nodes), is used to correct its own clock in a fault- tolerant way.

Two kinds of clock correction are possible, pure offset correction and offset correction and rate correction combined. The offset correction only corrects the clock offset whereas the rate correction also tries to align the different rates of the clocks in the system, thereby keeping the clocks closer to each other (reducing the amount of offset correction necessary, thereby increasing the available bandwidth due to decreased inter transmission gaps). Typically the clocks are corrected at the end of a cycle, or at the end of two cycles in case of rate and offset correction because two measurement values are necessary to calculate a rate deviation.

All synchronization) nodes have to transmit synchronization frames during one of their assigned slots on both channels 10, 20 at the same time; so all nodes receive the same time information, also nodes connected to only one channel.

In order to apply the synchronization between the two communication controllers 30, 32, each communication controller 30, 32 closely follows the timing difference between itself and its attached single-channel communication controller of the opposite channel; in particular, each single-channel communication controller 30, 32 measures each others time in relation to its own time, calculates offset and rate difference by two possible methods, and the fault-tolerant parameters are calculated using the method provided for both rate and offset corrections, for instance by direct exchange of a numerical value incorporating more information regarding the local clock and the channel. Concerning the mathematical nomenclature with respect to the system, be z the length of one cycle, be C the set of all communication controllers 30, 32, be A the set of all communication controllers 30 of channel 10, be B the set of all communication controllers 32 of channel 20, be A_s the set of all communication controllers 30 configured to transmit synchronization frames of channel 10 (A_s <z.A) and - be i?_s the set of all communication controllers 32 configured to transmit synchronization frames of channel 20 (B_s c B).

In the single-channel architecture of Figures 2A, 2B, 2C, for each communication controller / e A_s exists a unique communication controller/ e B_s which both belong to the same node and vice versa. Essentially, this means that there exists a bijective mapping s between A₅ and B_s, where for each node i e A_s, s(i) e B_s is the communication controller of channel 20 within the same node as i (and for each node/ e B_s, s^A(f) e A_s is the communication controller of channel 10 within the same node as

J)-

Be a, b e R⁺ the damping factors, a for intra-channel clock correction and b for inter-channel clock correction, a and h may be chosen separately for offset correction and rate correction (for simplification, this is not reflected in the following formulas).

Be Tj(t) the real time of the cycle time t of communication controller i e C. Thereby T,(0) is the real time of when communication controller i thinks cycle one shall start and Tj(z) is the real time of when communication controller i thinks cycle one ends and cycle two starts and so on.

Finally, be T₁ the rate of the node i e C so that T,{t) = Ti(O) + τft.

With respect to the simple exchange, firstly the algorithm for pure offset correction will be described: ' Each node measures the difference between its own clock and the clocks of all observable nodes. This is done in FlexRay by comparing the arrival time of an incoming synchronization) frame with the expected arrival time.

Be i e A_s. Then for each observable communication controller/ e A_s,j ≠ i, communication controller i measures each cycle the offset

All nodes i e B_s do likewise.

It depends on the system configuration, when within one cycle the offset measurement between a certain pair of communication controller is taken, so it is represented by x. My is the difference in offset between communication controller i and communication controller/ within cycle one after the last correction measured in the local time of communication controller i flawed by the measurement error ε.

Additionally the following measure is also taken:

All nodes i e B₈ do likewise.

These are the offsets to the other communication controller within the same node.

According to the present invention, now the offset correction term for communication controller i is calculated by

HKUJ , <_M

where FT is a fault-tolerant offset calculation algorithm. Examples for such algorithms can be found in Fred B. Schneider, "Understanding Protocols for Byzantine Clock Synchronization", Cornell University, Ithaca, New York, August 1987. The preferred variant is the F[ault-]T[olerant]M[idpoint] algorithm. All nodes i e B_s do likewise.

It can be proven that this algorithm works and the clocks of both channels 10, 20 converge, if 1 - Ma - 21b > 0.

Good values for a and b are a b

2 4

1.5 6 Especially a = 2 and b = 4 is extremely advantageous for implementation

(no A[rithmetic-]L[ogic]U[nit] for division required, a simple shifting suffices) and is the preferred choice. With respect to the simple exchange, now the algorithm for offset correction and rate correction will be described:

Each node measures the difference between its own clock and the clocks of all observable nodes. This is done in FlexRay by comparing the arrival time of an incoming sync(hronization) frame with the expected arrival time.

Be i e A_s. Then for each observable communication controller/ e A_s,j ≠ i, communication controller / measures each cycle the offset

All nodes i e B_s do likewise. It depends on the system configuration, when within one cycle the offset measurement between a certain pair of communication controller is taken, so it is represented by x_hJ.

M¹^ is the difference in offset between communication controller i and communication controller/ within cycle one after the last correction measured in the local time of communication controller i flawed by the measurement error ε. Ad¹, _j is the difference in offset between communication controller i and communication controller/ within cycle two after the last correction measured in the local time of communication controller i flawed by the measurement error ε.

Additionally the two following measures are also taken:

T (x +s-)-T (x +2-)+£ ' ^X',Φ)

All nodes i e B_s do likewise.

According to the present invention, now the offset correction term for communication controller i is calculated by HKI_^J M, 2 M(O

where FT is a fault-tolerant offset calculation algorithm. Examples for such algorithms can be found in Fred B. Schneider, "Understanding Protocols for Byzantine Clock Synchronization", Cornell University, Ithaca, New York, August 1987. The preferred variant is the F[ault-]T[olerant]M[idpoint] algorithm. All nodes i e B₈ do likewise. The rate correction term for communication controller i is calculated by

where FT is a fault-tolerant offset calculation algorithm. Examples for such algorithms can be found in Fred B. Schneider, "Understanding Protocols for Byzantine Clock Synchronization", Cornell University, Ithaca, New York, August

1987. The preferred variant is the F[ault-]T[olerant]M[idpoint] algorithm. All nodes i e B_s do likewise.

It can be proven that this algorithm works and the clocks of both channels 10, 20 converge, if \ - \la - 21b > 0. Good values for a and b are

2 4 1.5 6

Especially a - 2 and b = 4 is extremely advantageous for implementation (no A[rithmetic-]L[ogic]U[nit] for division required, a simple shifting suffices) and is the preferred choice. In the following, an example for the algorithm for offset correction and rate correction is given:

Be FT the F[ault-]T[olerant]M[idpoint] algorithm. The FTM can tolerate up to k Byzantine failures if more than 2&+1 measurements are given.

The FTM algorithm sorts the passed values and removes the k lowest values and the k highest values. It then chooses the remaining highest value and the remaining lowest value and calculates the average of both.

Concerning the offset correction of node i, after removing the high values and low values, the measured value of the offset difference of node L₁ is the lowest one and the measured value of the offset difference of node H, is the highest one.

It follows:

FTM((M,² ,) . ) Ken K_s, +M?_tLl M 2a

With a = 2 and b = 4, it follows for the calculation of the correction term:

For rate correction and the FTM, the result is similar.

Otherwise, Figure 3 A and Figure 3B show examples of how the measurement between the two associated communication controllers 30, 32 is performed for the simple way:

Figure 3A shows how the single-channel communication controller 30 of the first channel 10 and the single-channel communication controller 32 of the second channel 20 can exchange their clock information via one signal each. The two communication controllers 30, 32 measure their offset and change the length of the cycle c to compensate it (— > cycle boundaries bw with correction compared to cycle boundaries bo without correction; the difference between bw and bo is the correct offset co). A function f is used to make this mechanism fault-tolerant. If the propagation delay of the signal is known it can be compensated for additional accuracy.

Figure 3B shows how the single-channel communication controller 30 of the first channel 10 and the single-channel communication controller 32 of the second channel 20 can exchange their clock information via one signal each. The two communication controllers 30, 32 measure their offset differences as well as rate differences and change the length of the cycle c to compensate it or them (--> cycle boundaries bw with correction compared to cycle boundaries bo without correction; the difference between bw and bo is the correct offset / correct rate cor). Functions f and g are used to make this mechanism fault-tolerant. If the propagation delay of the signal is known it can be compensated for additional accuracy. Concerning the properties, the above-described algorithm is quick and does not require a complex additional interface between two associated communication controllers 30, 32.

Since the measurement of the time difference of the one communication controller 30 (or 32) to its associated communication controller 32 (or 30) can be done during the normal communication cycle c, the calculation of the corrections need not be delayed. However, the achievable precision may be seriously decreased compared to a conventional approach. Especially non sync[hronization] nodes of different channels 10, 20 are subject to a potentially high clock difference. a and b shall be configurable instead of being fixed to the optimal choice, for being compatible to a single channel system, wherein a choice of one is optimal (no additional term from the second channel 20 has to be incorporated).

With respect to the complex exchange, firstly the algorithm for pure offset correction will be described: Each node measures the difference between its own clock and the clocks of all observable nodes. This is done in FlexRay by comparing the arrival time of an incoming sync(hronization) frame with the expected arrival time.

Be / e A_s. Then for each observable communication controller/ e A_s,j ≠ i, communication controller i measures each cycle the offset

All nodes i e B_s do likewise.

It depends on the system configuration, when within one cycle the offset measurement between a certain pair of communication controller is taken, so it is represented by x,y. Mi_j is the difference in offset between communication controller i and communication controller/ within cycle one after the last correction measured in the local time of communication controller i flawed by the measurement error ε.

According to the present invention, now the following terms are calculated for each communication controller i e A_s by

where FT is a fault-tolerant offset calculation algorithm. Examples for such algorithms can be found in Fred B. Schneider, "Understanding Protocols for Byzantine Clock Synchronization", Cornell University, Ithaca, New York, August 1987. The preferred variant is the F[ault-]T[olerant]M[idpoint] algorithm. The same is done for all communication controllers i e B₃:

Each node i e A₈ now transmits its correction term <5,^offset to its associated communication controller s(i) and receives from the associated communication controller s(i) in turn δ_s^°^ffset. Now, communication controller / can calculate its offset correction term

All nodes i e B₃ do likewise.

It can be proven that this algorithm works and the clocks of both channels 10, 20 converge, if 1 - I/a - l/b > 0 and a = b. Good values for a and b are a b

2 2

4 4

Especially a = 2 and b = 2 is extremely advantageous for implementation (no A[rithmetic-]L[ogic]U[nit] for division required, a simple shifting suffices) and is the preferred choice. With respect to the complex exchange, now the algorithm for offset correction and rate correction will be described:

Each node measures the difference between its own clock and the clocks of all observable nodes. This is done in FlexRay by comparing the arrival time of an incoming sync(hronization) frame with the expected arrival time. Be i e A₈. Then for each observable communication controller/ e A_s,j ≠ i, communication controller i measures each cycle the offset

All nodes i e B_s do likewise.

It depends on the system configuration, when within one cycle the offset measurement between a certain pair of communication controller is taken, so it is represented by x_hJ.

Ad¹ _Ij is the difference in offset between communication controller i and communication controller/ within cycle one after the last correction measured in the local time of communication controller i flawed by the measurement error ε.

Ad² _Jj is the difference in offset between communication controller i and communication controller y within cycle two after the last correction measured in the local time of communication controller i flawed by the measurement error ε.

According to the present invention, now the following terms are calculated for each communication controller i e. A_s by

where FT is a fault-tolerant offset calculation algorithm. Examples for such algorithms can be found in Fred B. Schneider, "Understanding Protocols for Byzantine Clock Synchronization", Cornell University, Ithaca, New York, August 1987. The preferred variant is the F[ault-]T[olerant]M[idpoint] algorithm. The same is done for all communication controllers i e B_s:

Each node i e A₅ now transmits its correction terms d°^met and <S,^mte to its associated communication controller s(ϊ) and receives from the associated communication controller s(ϊ) in turn δ_s(,)^θ{fset and δ_s(, ™^te. Now, communication controller i can calculate its offset correction term

and its rate correction term a b

All nodes / e B₅ do likewise.

It can be proven that this algorithm works and the clocks of both channels 10, 20 converge, if 1 - Ha - \/b > 0 and a = b. Good values for a and b are a b

2 2

4 4

Especially a = 2 and ό = 2 is extremely advantageous for implementation (no A[rithmetic-]L[ogic]U[nit] for division required, a simple shifting suffices) and is the preferred choice .

In the following, an example for the algorithm for offset correction and rate correction is given:

Be FT the F[ault-]T[olerant]M[idpoint] algorithm. The FTM can tolerate up to k Byzantine failures if more than 2&+1 measurements are given. The FTM algorithm sorts the passed values and removes the k lowest values and the k highest values. It then chooses the remaining highest value and the remaining lowest value and calculates the average of both.

Concerning the offset correction of node i, after removing the high values and low values, - the measured value of the offset difference of node Z,- is the lowest one and the measured value of the offset difference of node H; is the highest one.

It follows:

In brackets is given not the term that has to be calculated within i, but the term that is really applied. For analysis of the behaviour of this algorithm this formula is helpful. With a = 2 and b = 2, it follows for the calculation of the correction term:

For rate correction and the FTM, the result is similar. Concerning the properties, the above-described algorithm is quite quick but requires an interface between the two associated communication controllers 30, 32 for exchanging the δ values.

Since the measurement of the time difference of the one communication controller 30 (or 32) to its associated communication controller 32 (or 30) cannot be done during the normal communication cycle, but is exchanged after the first FT calculation, the calculation of the correction terms is delayed through the additional necessary exchange of δ values and the ensuing additional calculation.

Especially the exchange costs some time because both communication controllers 30, 32 are subject to an offset. Both communication controllers 30, 32 can only finish the calculation of their correction terms when the slowest one has done so. This can only work if both communication controllers 30, 32 differ only within bounds that have to be guaranteed by the system start-up, otherwise the clock synchronization cannot "kick in". Once this condition is given, the clock synchronization algorithm can keep the associated controllers 30, 32 within these bounds.

The above-described algorithm has the advantage that the achievable precision is nearly the same as for conventional dual-channel architectures. Only non- sync[hronization] nodes are subject to a slightiy higher clock difference. a and b have to be chosen identical to achieve the highest possible precision. Therefore only one configuration parameter shall be given for both. This configuration parameter shall still be configurable instead of the optimal choice of two, for being compatible to a single-channel system wherein a choice of one is optimal (no additional term from the second channel 20 has to be incorporated).

All in all, the present invention proposes a new way to synchronize the operation of two independent single-channel communication controllers 30, 32 - probably (cf. Figure 2C) but not necessarily (cf. Figure 2B) on the same chip - on different channels 10, 20 to virtually emulate the behaviour of a two-channel controller CC (cf. Figure 1), thereby enabling cost-effective Integrated] C[ircuit] blocks to be created that can be used to either generate either single-channel communication controllers or dual-channel communication controllers.

For the present invention, the communication controller 30, 32 is of primary importance. The bus driver 12, 22, the bus guardian and the host device 60 are listed to provide a full technical concept in which context the present invention might be used. The present invention is not limited or restricted by the presence or absence of those devices.

LIST OF REFERENCE NUMERALS

100 node (first embodiment of the present invention; cf. Figure 2A)

100' node (second embodiment of the present invention; cf. Figure 2B)

100" node (third embodiment of the present invention; cf. Figure 2C) 10 first channel

12 bus driver of the first channel 10

20 second channel

22 bus driver of the second channel 12

30 communication controller, in particular assigned to the first channel 10 32 communication controller, in particular assigned to the second channel

12 40 controller host interface, in particular assigned to the first communication controller 30

42 controller host interface, in particular assigned to the second communication controller 32

50 protocol engine, in particular assigned to the first communication controller 30 ^]

52 protocol engine, in particular assigned to the second communication controller 32 54 local intra-channel communication external interface

(second embodiment of the present invention; cf. Figure 2B) 56 local intra-channel communication on-chip interface

(third embodiment of the present invention; cf. Figure 2C) 60 application host Bl bus driver of the first channel Cl (prior art; cf. Figure 1)

B2 bus driver of the second channel C2 (prior art; cf. Figure 1)

Cl first channel (prior art; cf. Figure 1)

C2 second channel (prior art; cf. Figure 1)

CC communication controller (prior art; cf. Figure 1) CI controller host interface of the communication controller CC

(prior art; cf. Figure 1) H application host (prior art; cf. Figure 1) N node (prior art; cf. Figure 1)

P protocol engine of the communication controller CC (prior art; cf.

Figure 1) bw cycle boundary with correction bo cycle boundary without correction

C cycle

CO correct offset cor correct offset / correct rate f function

Claims

CLAIMS:

1. Method for operating a distributed communication system, in particular for synchronizing clocks in such distributed communication system, with a number of nodes (100; 100'; 100") being interconnected by at least one communication link comprising at least two channels (10, 20), c h a r a c t e r i z e d i n that each channel (10, 20) is controlled by its own communication controller (30, 32).

2. Method according to claim 1, characterized in that for synchronizing the clocks differences in offset of the clocks as well as differences in rate of the clocks are corrected.

3. Computer program being able to run on at least one computer, in particular on at least one microprocessor, characterized in that the computer program is programmed in order to execute a method according to claim 1 or 2.

4. Computer program according to claim 3, characterized in that the computer program is stored on at least one R[ead]O[nly]M[emory], on at least one R[andom]A[ccess]M[emory] or on at least one flash memory.

5. Node (100; 100'; 100") of a distributed communication system with a number of nodes (100; 100'; 100") being interconnected by at least one communication link comprising at least two channels (10, 20), c h a r a c t e r i z e d i n that at least one communication controller (30, 32) is assigned to each channel (10, 20).

6. Node according to claim 5, characterized by means for synchronizing a clock of the node (100; 100'; 100"), said means correcting differences in offset of the clock as well as differences in rate of the clock.

7. Node according to claim 5 or 6, characterized by at least one bus guardian for controlling access of the communication controller (30, 32) to the communication link (10, 20), the corrected clock signal being made available to the bus guardian and the bus guardian adapting its clock accordingly.

8. Node according to at least one of claims 5 to 7, characterized by means for executing - the method according to claim 1 or 2 and/or the computer program according to claim 3 or 4.

9. Distributed communication system with a number of nodes (100; 100'; 100") according to at least one of claims 5 to 8, characterized in that said communication system is fault-tolerant and/or time-triggered.

10. Use of the method according to claim 1 or 2 and/or of at least one computer program according to claim 3 or 4 and/or of at least one node (100; 100'; 100") according to at least one of claims 5 to 8 and/or of the communication system according to claim 9 for synchronizing clocks in a at least dual-channel environment wherein differences in offset of the clocks as well as differences in rate of the clocks can be corrected.