WO2006100630A1

WO2006100630A1 - Processing module, communication protocol for streaming and/or for synchronizing such processing module as well as method of streaming and/or of synchronizing such processing module

Info

Publication number: WO2006100630A1
Application number: PCT/IB2006/050843
Authority: WO
Inventors: Andrei Radulescu
Original assignee: Koninklijke Philips Electronics N. V.
Priority date: 2005-03-22
Filing date: 2006-03-20
Publication date: 2006-09-28

Abstract

In order to provide a processing module (100), in particular an Intellectual] P [roperty] core, for example an A[pplication]P[rogramming]I[nterface], for processing data via at least one communication channel (10), wherein at least one state of the communication channel (10) is alternated by executing at least one task (tl, t2, t3, t4), and wherein the processing module (100) is streamed and/or synchronized by at least one primitive (GD, GS, GT, PD, PS), in particular by at least one group of primitives (GD, GS, GT, PD, PS), wherein a higher level of abstraction is obtained, at least one further primitive (U) for restoring the state of the communication channel (10) to the state before alternation, in particular for deleting the state built up in the communication channel (10) by the task (tl, t2, t3, t4) during its execution, is proposed.

Description

PROCESSING MODULE, COMMUNICATION PROTOCOL FOR STREAMING AND/OR FOR SYNCHRONIZING SUCH PROCESSING MODULE AS WELL AS METHOD OF STREAMING AND/OR OF SYNCHRONIZING SUCH PROCESSING MODULE

The present invention relates to a processing module, in particular to an Intellectual] Property] core, for example to an A[pplication]P[rogramming]I[nterface], for processing data via at least one communication channel, wherein at least one state of the communication channel is alternated by executing at least one task, and wherein the processing module is streamed and/or synchronized by at least one primitive, in particular by at least one group of primitives.

The present invention further relates to a communication protocol for streaming and/or for synchronizing such processing module, in particular such I tellectual] Property] core, for example such A[pplication]P[rogramming]I[nterface]. The present invention further relates to a method of streaming and/or of synchronizing such processing module, in particular such I[ntellectual]P[roperty] core, for example such A[pplication]P[rogramming]I[nterface].

Modern embedded systems comprise a large number of processing modules, also called I [ntellectual] Property] cores. On these processing modules, there are one or more tasks communicating with each other either directly or via using a memory. These tasks and the communication between these tasks constitute one configuration, also called a user mode, of the system. In complex systems, for example in high-end T[ele]V[ision] chips, such as Philips' Nexperia Home S[ystem]o[n]C[hip] (also internally called Philips' Base Cat), there can be hundreds of configurations, each comprising more than a hundred of tasks. To simplify the task creation, high-level communication interfaces have been defined, addressing specific task characteristics, and different flexibility and efficiency requirements. For software tasks, the T[riMedia]S[oftware]S[treaming]A[rchitecture], - the T[riMedia]S[oftware]S[treaming]A[rchitecture]-2, the C[PU]-[controlled]HE[terogeneous]A[rchitectures for signal]P[rocessing] (cf. prior art article "C-HEAP: A Heterogeneous Multiprocessor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems" by Andre Nieuwland, Jeffrey Kang, Om Prakash Gangwal, Ramanathan Sethuraman, Natalino Busa, Kees G.

W. Goossens, Rafael Peset Llopis, and Paul Lippens, Journal of Design Automation for Embedded Systems, 7(3), 2002), the communication protocol Arachne (cf. prior art article "A protocol and memory manager for on-chip communication" by Kees G. W. Goossens, International Symposium on Circuits and Systems, pages 225 to 228,

2001 ; http://www.homepages.inf.ed.ac.uk/kgoossen/2001-iscas.pdf), and the Y[-Chart]A[pplication]P[rogramming]I[nterface] (cf. prior art article "YAPI: Application Modeling for Signal Processing Systems" by Erwin A. de Kock, Gerben Essink, W. J. M. Smits, Pieter van der Wolf, J. -Y. Brunei, Wido M. Kruijtzer, P. Lieverse, and K. A. Vissers, Design Automation

Conference, 2000, cf. http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac2000/papers/2000/ dac00/pdffiles/23_3.pdf) have been defined and used. For hardware tasks, Eclipse has been defined and used. In this context,

Eclipse is an architecture template for the design of versatile media-processing S[ystem]o[n]C[hip] subsystems.

Currently, there are efforts to reduce this diversity of protocols and to define one single protocol, namely T[ask]T[ransaction]L[ayer] covering all cases and requirements (cf. prior art article "Design and Programming of Embedded

Multiprocessors: An Interface-Centric Approach" by Pieter van der Wolf, Erwin A. de Kock, Tomas Henriksson, Wido M. Kruijtzer, and Gerben Essink, IEEE Proceedings of Hardware/Software Codesign and System Synthesis, 2004).

In the following, prior art systems relating to channel restoring operations in IP core communication, i. e. A[pplication]S[pecific]I[ntegrated]C[ircuit]s, F[ield]P[rogrammable]G[ate]A[rray]s or similar circuits are described. In prior art document EP 1 154 601 Al a routing system with a routing

Application] P [rogram] I [nterf ace] is disclosed. In this context an operation for "cleaning up" a channel, which is basically a reset of the channel, is described. Cleaning up is called as a result of an internal error preventing further communication.

In the prior art article "Communication Services for Networks on Chip" by Andrei Radulescu and Kees G. W. Goossens (cf. Domain-Specific Processors:

Systems, Architectures, Modeling, and Simulation (SAMOS), Series Volume 20, 2002), N[etwork]o[n]C[hip] communication services, such as throughput/latency guarantees, ordering or guaranteed completion, are described. However, the present invention relates to a different class of communication services being at a higher level of abstraction than NoC services, and being able to use NoC services to be implemented.

Moreover, in the prior art article "Core Communication Interface for FPGAs" by Jose Carlos Palma, Aline Vieira de Mello, Leandro Mδller, Fernando Moraes, and Ney Calazans (cf. Proceedings of the 15th Symposium on Integrated Circuits and Systems Design (SBCCI'02), 2002, pages 183 to 188), a communication interface to be used in F[ield]P[rogrammable]G[ate]A[rray]s is described. Again, low- level communication operations are described, which do not overlap with the class of operations the present invention belongs to.

Starting from the disadvantages and shortcomings as described above and taking the prior art as discussed into account, an object of the present invention is to further develop a processing module of the kind as described in the technical field, a communication protocol of the kind as described in the technical field as well as a method of the kind as described in the technical field, in such way that an interface comprising a higher level of abstraction is provided thus placing the present invention in the same context of T[ask]T[ransaction]L[ayer].

The object of the present invention is achieved by a processing module comprising the features of claim 1, by a communication protocol comprising the features of claim2 as well as by a method comprising the features of claim 7. Advantageous embodiments and expedient improvements of the present invention are disclosed in the respective dependent claims. The present invention, in particular the processing module according to the present invention as well as the method according to the present invention, is principally based on the idea that the state of the communication channel can be restored to the state before alternation, in particular that the state built up in the communication channel by the task during its execution can be deleted, by the at least one further primitive; in particular, the present invention is based on the idea of deleting the state built up in a channel during a task execution and of restoring the channel to its previous state by at least one further primitive, for example by minimizing a task's state before task switch.

Thus, the scope of the present invention refers to streaming and/or to synchronizing the processing module or processing device. By this technical measure, the time to market is decreased by increasing the reuse and development efficiency.

According to a preferred embodiment of the present invention the group of primitives comprises at least one first primitive for allocating at least part of memory space in the communication channel. This memory can be used by the data producer or producer task to prepare data to be sent to the channel.

Moreover, advantageously the group of primitives comprises at least one second primitive being used by the data producer for sending the data into the communication channel.

Beside that, advantageously the group of primitives comprises at least one third primitive being used by the data consumer or consumer task to obtain access to the data in the channel.

Independently thereof or in combination therewith, the group of primitives may comprise at least one fourth primitive being used by the consumer to release memory space, wherein the memory space is used for transferring the data. The further primitive (Undo) helps in minimizing the task state by removing any state built in the channel. In this way, task switching is made faster. If the channel builds up state, which is not undone, task switching is still possible but takes longer as there is more state to be saved.

Moreover, in an advantageous embodiment of the present invention, the memory space being allocated for sending the data to the communication channel and/or - the memory space being allocated for transferring the data is freed by the further primitive.

Using the present invention allows the same low-cost implementation as in the case of Eclipse. The channel state is adjusted back to the original state, allowing a quick and low-cost task switching. In addition to this, there is no need to have special semantics to the group of primitives, allowing seamless unification in standardization efforts, such as T[ask]T[ransaction]L[ayer].

The present invention further relates to a primitive for at least one processing module, in particular for at least one I[ntellectual]P[roperty] core, for example for at least one A[pplication]P[rogramming]I[nterface], the processing module being designed for processing data via at least one communication channel, wherein at least one state of the communication channel is alternated by executing at least one task, and wherein the primitive is designed for restoring the state of the communication channel to the state before alternation, in particular for deleting the state built up in the communication channel by the task during its execution.

Said further primitive advantageously relates to at least one T[ask]T[ransaction]L[ayer] transaction.

By the present invention, the further primitive U[ndo] is proposed for a communication interface such as TTL. The Undo primitive restores the communication channel state to a previous one, which helps in minimizing the task state to be saved on task switch. The main advantage of such an additional primitive is that the other communication primitives (G[et]S[pace], P[ut]D[ata], G[et]D[ata], P[ut]S[pace]) are allowed to have a common semantics, resulting in easing standardization efforts of TTL. The present invention finally relates to the use of at least one processing module as described above and/or of at least one communication protocol as described above and/or of at least one primitive as described above and/or of the method as described above for at least one embedded system, in particular for at least one E[mbedded]S[ystem]A[rchitecture] on silicon, for example for at least one network-on- silicon, assigned to buses and/or to chip date transfer of a digital semiconductor audio/video platform. In addition to S[ystems]o[n]C[hip], the present invention also applies to multi-board systems, to multi-chip systems, and/or to multi-computer systems.

As already discussed above, there are several options to embody as well as to improve the teaching of the present invention in an advantageous manner. To this aim, reference is made to the claims respectively dependent on claim 2 and on claim 7; further improvements, features and advantages of the present invention are explained below in more detail with reference to preferred embodiments by way of example and to the accompanying drawings where

Fig. 1 schematically shows an embodiment of a processing module according to the prior art, namely of an A[pplication]P[rogramming]I[nterface], comprising four primitives for streaming and/or for synchronizing;

Fig. 2 schematically shows an embodiment of buffer management in the processing module of Fig. 1;

Fig. 3 schematically shows a first embodiment of task management according to communication protocols T[riMedia]S[oftware]S[treaming]A[rchitecture], C[PU]- [controlled]HE[terogeneous] Architectures for signal]P[rocessing], and Arachne in the view of a data producer;

Fig. 4 schematically shows a first embodiment of task management according to communication protocols T[riMedia]S[oftware]S[treaming]A[rchitecture], C[PU]- [controlled]HE[terogeneous] Architectures for signal]P[rocessing], and Arachne in the view of a data consumer;

Fig. 5 schematically shows a second embodiment of task management according to communication protocol Eclipse in the view of a data producer; Fig. 6 schematically shows a second embodiment of task management according to communication protocol Eclipse in the view of a data consumer;

Fig. 7 schematically shows a third embodiment of task management according to the present invention using the further primitive

"undo" in the view of a data producer; and

Fig. 8 schematically shows a third embodiment of task management according to the present invention using the further primitive "undo" in the view of a data consumer.

The same reference numerals are used for corresponding parts in Fig. 1 to Fig. 8.

To simplify the development of computational tasks, high-level communication interfaces have been defined in the prior art, addressing specific task characteristics, and different flexibility and efficiency requirements. Such communication interfaces include T[riMedia]S[oftware]S[treaming]A[rchitecture], C[PU]-[controlled]HE[terogeneous]A[rchitectures for signal]P[rocessing], Y[- Chart]A[pplication]P[rogramming]I[nterface], Arachne, and Eclipse.

Such a variety of protocols, each with different semantics, can introduce incompatibilities between tasks, and therefore make system integration more difficult. As a result, standardization is undergoing to unify these protocols under a single communication interface called T[ask]T[ransaction]L[ayer].

In these communication protocols, an

A[pplication]P[rogramming]I[nterface] 100 seen by a respective task tl, t2, t3, t4 consists of four synchronization primitives GS, PD, GD, PS. These primitives GS, PD, GD, PS may have different names in the above-mentioned protocols; in the following the TTL terminology is used. Fig. 1 depicts a communication channel 10, namely a TTL channel, with relating communication primitives GS, PD, GD, PS.

The first primitive GS, namely GetSpace, allocates memory 42 in the communication channel 10. This memory 42 is used by task tl to prepare data to be sent to the channel 10, wherein task tl is assigned to a producer 20.

The second primitive PD, namely PutData, is used by the producer 20 to send data to the communication channel 10.

The third primitive GD, namely GetData, is used by a consumer 30 to obtain access to the data in the channel 10.

The fourth primitive PS, namely PutSpace, is used by the consumer 30 to release memory space 44 used for transferring the data.

Fig. 2 gives an example of implementation of TTL buffer management. As shown in Fig. 2, memory management for a buffer or memory 40 of the channel 10 consists of maintaining four pointers (with reference numerals A, B, C, D) indicating the begin and the end of the regions claimed for writing 42 and for reading 44. When no empty space e is claimed, the first pointer A and the second pointer B point to the same location. When no full space f is claimed, the third pointer C and the fourth pointer D point to the same location.

In most of the cases, for instance TSSA, C-HEAP, and Arachne, the above-mentioned communication protocols work as follows (cf. Figs 3, 4 including the effects of the primitives in the four pointers A, B, C, D):

The first primitive G[et]S[pace] moves the first pointer A ahead in order to indicate that empty space is reserved for writing.

The second primitive P[ut]D[ata] moves the second pointer B ahead in order to indicate that full space has been released to the channel 10, i. e. that the data has been transferred.

The third primitive G[et]D[ata] moves the third pointer C ahead in order to indicate that more full space f has been claimed for reading.

The fourth primitive P[ut]S[pace] moves the fourth pointer D ahead in order to indicate that more empty space e is available in the channel 10. Task swapped out is indicated by the reference numeral tso, and task swapped in is indicated by the reference numeral tsi.

For all functions of the API 100, the number of items that the pointer A, B, C, D is moved is specified as a parameter.

In protocols such as TSSA, C-HEAP, and Arachne, G[et]S[pace] and G[et]D[ata] change the state of the channel 10, i. e. modify the first pointer A and the third pointer C. This also means that on two consecutive calls to G [et] S [pace] / G[et]D[ata], two consecutive empty e / data f regions in the channel 10 are returned. In Fig. 3, TSSA, C-HEAP, and Arachne communication protocols are depicted in the view of the producers' 20 side, and in Fig. 4, TSSA, C-HEAP, and Arachne communication protocols are depicted in the view of the consumer's 30 side. As depicted in Figs 3 and 4 the channel builds up state. The first primitive G[et]S[pace] moves the first pointer A in order to indicate that empty space e is reserved for writing. The first pointer A is moved forward relative to its current position with the indicated number of items.

The second primitive P[ut]D[ata] moves the second pointer B ahead in order to indicate that full space f (data) has been released to the channel 10. The third primitive G[et]D[ata] moves the third pointer C in order to indicate that more full space f has been claimed for reading. The third pointer C is moved forward relative to its current position with the indicated number of items.

The fourth primitive P[ut]S[pace] moves the fourth pointer D ahead in order to indicate that more empty space e is available in the channel 10. On task switching, the state of the task being swapped out must be saved

(either explicitly or implicitly), including the built up channel state (changes to the first pointer A and to the third pointer C). Task state saving is known to be an expensive operation.

Eclipseisoneexceptionfromthese semantics because it works as follows (cf. Figs 5, 6 including the effects of the primitives in the four pointers A, B, C, D):

In Eclipse, the first primitive G[et]S[pace] does not move the first pointer A in the channel 10 as this would build up channel state. Instead, the first primitive G[et]S[pace] only checks if the first pointer A can be moved for the specified amount of items relative to the second pointer B (, i. e. only checks if space can be claimed, which is an implicit claim).

As a result, the second primitive P[ut]D[ata] moves both the first pointer A and the second pointer B ahead. Actually, in this implementation only one of the first pointer A and of the second pointer B needs to be maintained.

Similarly to the first primitive G[et]S[pace], the third primitive G[et]D[ata] only checks if the third pointer C can be moved ahead relative to the fourth pointer D without actually moving it. As a result, the fourth primitive P[ut]S[pace] moves both the third pointer C and the fourth pointer D (with the option of optimizing away one of the third pointer C or of the fourth pointer D).

As there is no channel state being built up, a further primitive

G[et]T[ask] does not need to delete any channel state, i. e. there is no moving back of the first pointer A and/or of the third pointer C. In Eclipse, on two consecutive calls to G[et]S[pace] / G[et]D[ata], the same two empty e / full (data) f regions in the channel 10 are returned, unless data / space is released to the channel 10 with P[ut]D[ata] / P[ut]S[pace].

In Fig. 5, Eclipse communication protocols are depicted in the view of the producer's 20 side, and in Fig. 6, Eclipse communication protocols are depicted in the view of the consumer's 30 side. As depicted in Figs 5 and 6 the channel builds up no state.

The reason for the semantics has been changed; in this case a stateless task is obtained, which leads to very fast task switch. In such a system, if the task has not completed before switching to another task, the next run of the task will just reproduce its previous run. In such a case, the channel 10 has to return to its original state, and consequently the channel 10 will return the same empty e /data f regions as in the previous task run.

In the case of a unified protocol, such as TTL, such changed semantics depending on the mode can be confusing and misleading for a user of the API 100. Therefore by the present invention an improved solution is proposed, which allows stateless tasks without changing the semantics of the existing API primitives GS, PD, GD,

PS (cf. Figs 7, 8 including the effects of the primitives in the four pointers A, B, C, D). The proposed solution is to introduce a further or additional primitive called Undo (reference numeral U), which restores the state of the channel 10 to the previous state, in particular which deletes the state built up in the communication channel 10 by the task tl, t2, t3, t4 during its execution. This is equivalent to deleting the state built up in the channel 10 by a task during its execution. Therefore, after an

U[ndo] primitive has been applied to all channels 10 of a stateless task, a switch to another task can be safely performed. The undo U semantics are to move the first pointer A back to the second pointer B, and the third pointer C back to the fourth pointer D. Several variants of the

Undo() primitive can be defined: Undo() restores the state of all opened channels 10 of the current task; this is, for each of the tasks' channels, it moves back the first pointer A and the third pointer C to their original place (second pointer B and fourth pointer D, respectively; cf. Figs 7, 8). Undo(c) restores the state of channel c.

UndoSpace(c) restores the state related to the empty space in channel c; this is, it moves back the first pointer A to the location of the second pointer B.

UndoData(c) restores the state related to the full space in channel c; this is, it moves back the third pointer C to the location of the fourth pointer D. Thus, by the present invention a further primitive for a processing module 100, in particular for an IP core communication interface, is provided; thereby, the class of communication services the present invention refers to is streaming and synchronization.

LIST OF REFERENCE NUMERALS

100 processing module, in particular Intellectual] Property] core, for example A[pplication]P[rogramming]I[nterface]

10 communication channel

0 data producer

30 data consumer

40 memory space 42 part or region of the memory space 40 being allocated for sending the data to the communication channel 10, in particular memory space being claimed for writing 44 part or region of the memory space 40 being allocated for transferring the data, in particular memory space being claimed for reading A first pointer indicating begin of part or region 42 being allocated for sending the data to the communication channel 10 B second pointer indicating end of part or region 42 being allocated for sending the data to the communication channel 10

C third pointer indicating begin of part or region 44 being allocated for transferring the data

D fourth pointer indicating end of part or region 44 being allocated for transferring the data e empty memory space and/or free memory space f full space and/or memory space being used by the data GD third primitive GetData

GS first primitive GetSpace

PD second primitive PutData

PS fourth primitive PutSpace tl first task t2 second task t3 third task t4 fourth task tsi task swapped in tso task swapped out

U further primitive Undo

Claims

CLAIMS:

1. A processing module (100) for processing data via at least one communication channel (10), wherein at least one state of the communication channel (10) is alternated by executing at least one task (tl, t2, t3, t4), and wherein the processing module (100) is streamed and/or synchronized by at least one primitive (GD, GS, GT, PD, PS), characterized by at least one further primitive (U) for restoring the state of the communication channel (10) to the state before alternation.

2. A communication protocol for streaming and/or for synchronizing at least one processing module (100) according to claim 1 and - for communicating the data between at least one data producer

(20) and at least one data consumer (30) via the communication channel (10).

3. The communication protocol according to claim 2, characterized in that more than one task (tl, t2, t3, t4) is provided and - that the tasks (tl, t2, t3, t4) communicate with each other either directly or via using memory space (40).

4. The communication protocol according to claim 2 or 3, characterized in that the group of primitives (GD, GS, GT, PD, PS) comprises at least one first primitive (GS) for allocating at least part or region (42) of the memory space (40), said part or region (42) being designed for sending the data to the communication channel (10), at least one second primitive (PD) being assigned to the data producer (20) and being designed for sending the data in the communication channel (10), - at least one third primitive (GD) being assigned to the data consumer (30) and being designed for providing access to the data in the communication channel (10), and at least one fourth primitive (PS) — being assigned to the data consumer (30) and being designed for releasing at least part or region (44) of the memory space (40), said part or region (44) being designed for transferring the data.

5. The communication protocol according to at least one of claims 2 to 4, characterized in that the memory space (42) being allocated for sending the data to the communication channel (10) and/or the memory space (44) being allocated for transferring the data is freed by the further primitive (U).

6. A primitive (U) for at least one processing module (100), the processing module (100) being designed for processing data via at least one communication channel (10), wherein at least one state of the communication channel (10) is alternated by executing at least one task (tl, t2, t3, t4), characterized by restoring the state of the communication channel (10) to the state before alternation.

7. A method of streaming and/or of synchronizing at least one processing module (100) by at least one primitive (GD, GS, GT, PD, PS), the processing module (100) being designed for processing data via at least one communication channel (10), wherein at least one state of the communication channel (10) is alternated by executing at least one task (tl, t2, t3, t4), characterized in that the state of the communication channel (10) can be restored to the state before alternation.

8. The method according to claim 7, characterized in that at least part or region (42) of memory space (40) in the communication channel (10) for sending the data to the communication channel (10) is allocated by at least one first primitive (GS), that the data is sent in the communication channel (10) by at least one second primitive (PD), that access to the data in the communication channel (10) is provided by at least one third primitive (GD), and - that at least part or region (44) of the memory space (40) for transferring the data is released by at least one fourth primitive (PS).

9. The method according to claim 7 or 8, characterized in that the memory space being allocated for sending the data to the communication channel (10) and/or that the memory space being allocated for transferring the data is released by the further primitive (U).

10. Use of at least one processing module (100) according to claim 1 and/or of at least one communication protocol according to at least one of claims 2 to 5 and/or of at least one primitive (U) according to claim 6 and/or of the method according to at least one of claims 7 to 9 for at least one embedded system, for at least one system on chip, - for at least one multi-board system, for at least one multi-chip system, for at least one multi-computer system.