WO2001090888A1

WO2001090888A1 - A data processing system having an address generation unit with hardwired multidimensional memory indexing support

Info

Publication number: WO2001090888A1
Application number: PCT/EP2000/004671
Authority: WO
Inventors: Jean-Paul Theis
Original assignee: Theis Jean Paul
Priority date: 2000-05-23
Filing date: 2000-05-23
Publication date: 2001-11-29

Abstract

The present invention describes a data processing system (microprocessor, CPU, DSP, micro-controller) with an address generation unit with hardwired multidimensional memory indexing support. The address generation unit is able to compute one or more multidimensional memory indexes based on only a limited number of initialization and operation data without requiring explicit instructions in the program code. The advantages of such an 'intelligent' hardwired address generation unit are substantial savings in program code size, power consumption and effective processing speed/power.

Description

A data processing system having an address generation unit with hardwired multidimensional memory indexing support

1. Field of the invention

The present invention relates to the field of architecture design of data processing systems in general. More specifically, the invention is dealing with architecture design issues at register transfer level of a processing system containing a processing device and an address generation unit with hardwired multidimensional memory indexing support.

2. Conventions, definition of terms, terminology

In the context of the present invention, the term 'processing device' means one of the following : microprocessor, CPU, DSP or micro-controller, the meaning of these terms being the one commonly described in the literature. As usual, it is further assumed that the machine code of a program, which is running or executed on said processing device, is containing exclusively instructions specific to said processing device and where said machine code is either obtained by compiling the source code of said program or is obtained by manual writing. The source code of said program is usually written in a high level programming language like C, Pascal, Basic, Fortran or Java. In the context of the present invention, the term 'processing system' means a processing device (microprocessor, CPU, DSP or micro-controller) coupled (connected) to an address generation unit as shown in figure 1. However, in practice the address generation unit based on the present invention can be part of the processing device itself, the processing device and address generation unit forming together one integrated circuit (IC) with the same functionality as a microprocessor, CPU, DSP or micro-controller. The reason why to use two different terms, namely 'processing system' and 'processing device', is to be able to clearly delimit and define the functionality of the address generation unit by conceptually splitting it off (as shown in figure 1) from the 'rest of the processing system' and by identifying the term 'rest of the processing system' with the term 'processing device'. Furthermore, this allows to define the data exchange (communication) between the address generation unit and the processing device. Despite this conceptual splitting-off from the processing device, here the term 'address generation unit' (AGU) has the same meaning as in the literature, namely a hardware circuitry used to perform address calculations, the calculated addresses referring to data (including instruction data) used by a program which is running on the processing device. An address generation unit may be part of a memory management unit but not vice versa. Furthermore, an address generation unit may also load/store the program data from/to a memory or cache, in which case it may also be part of a cache controller. The register-transfer level architecture of said processing system considers only (1) elementary building blocks, e.g. the address generation unit and the processing device, (2) input and output data of each building block, (3) the functionality of each building blocks, e.g. how the output data are calculated by using the input data, (4) the connections and data exchanged between the building blocks. Therefore, implementation details like intermediate amplifiers, buffers, latches, registers, which might be inserted between or inside elementary building blocks, are not considered since (1) they do not change the register transfer level architecture (2) although they may change the timing (due to the insertion of buffers, latches and registers) they do not change the functionality.

Since the term 'multidimensional memory indexing' has no clearly defined meaning in the literature, it shall now be defined in detail for the purpose of the present invention. Consider a block of nested loops, including the case of a single loop, being part of the source code of a program running (being executed) on the considered processing device. Assuming that the program source code is specified in some high level programming language like C, Pascal, Basic, Fortran or Java, the term 'loop' refers as usual either to a 'for'-, 'while'-, or 'do'- loop, conditional statements refer to 'if-then-else' statements and branch/jump statements to 'goto'- or 'exit'- statements. Furthermore, loop bodies of 'loops' may contain any mixture of conditional, branch and jump statements, where said statements may also be nested. Consider a n- dimensional variable bprfprf ... [ij, i_k , k=1, ...n being the indexes of the variable b, and a block of k nested loops with loop indexes m₁,m₂ ... m^ . An instance of the /7-dimensional variable b appearing in a loop body of a loop being part of the considered block of nested loops is of the form b[expr-, (m^m_∑ ... ^m _k)] [expr_∑ (mι,m₂ ... m^)] ... [expr_n (m^m_∑ ... rri_/J], where expη (mι,m₂ ... D/ i=1,2...n is an arbitrary complex expression representing index / of the considered instance and depending on the loop indexes mi,m₂ ... m_k. The memory index of the considered instance is defined to be the address within a physical memory or cache to/from which the value of (or the data corresponding to) the considered instance has (have) to be loaded/stored. The memory index corresponding to an instance of an n- dimensional variable is equal to n. Furthermore, to each instance of a multidimensional variable corresponds a different memory index with possibly a different form. The form taken by the memory index depends on how the memory/cache is organized. Two important forms of memory indexes (and thus memory organizations) shall be considered below. Note that n (the dimension of the considered variable) may actually be bigger than k (the number of nested loops) as exemplified by the excerpt of a program source code listed below, where there is only one loop is present (k=1), where however there is a 2-dimensional variable apι]p₂] appearing in the body of the for-loop (see below for details).

First, consider two examples illustrating the concept of instances of multidimensional variables. The excerpt of a program source code written in C and listed in section 3 contains a single for-loop with a loop index m and there are 7 different instances of a 2-dimensional variable apflpz] present in the loop body, namely a[m][m], a[ +1][ml a[m+1][m+1], a[m-1][m], a[m-1][m-1], a[m+2][m+1], a[m-1][m-1]. Note that the indexes of each instance involve different expressions in the loop index m. Another simple example is given by the following program which multiplies two 2-dimensional n x n matrices a and b together. The program contains 3 nested for-loops with loop indexes mι,m₂,m₃ : for(mι==1; m-i≤n; m-,++) for(m₂==1; m₂≤n; m₂++) for (m₃==1; m₃≤n; m₃++) c[mι][m₂] + alm_tJfmaPbtmaHm_∑] ; In this example, there appear 3 variable instances in the body of the innermost For-loop : 1 instance of variable c, namely c[mι ][ m₂], 1 instance of variable a, namely a[m-ι ][ m₃], 1 instance of variable b, namely b[m₃][m₂]. Note again that the indexes of each instance depend on different loop indexes. This example shall also illustrate the concept of the nesting level of a loop which is part of a block of nested loops. In the above example, the outermost loop with loop index m has nesting level 1, the loop with loop index m₂ has nesting level 2 , while the innermost loop with loop index m-i has nesting level 3. This concept is extended in the same way to the general case of a block of k nested loops.

Two specific forms of memory indexes of high practical interest are now discussed.

(1) If the physical memory/cache is linearly addressed, in other words if it is organized as a linear (one- dimensional) array, the n-dimensional memory index corresponding to an instance b[expr-, (m₁,m₂ ... rri_k)] [expr₂ (mι,m₂ ... miJ] ... [expr_n (m₁,m₂ ... m jof a n-dimensional variable bpι]p₂]...p„] appearing in the body of a loop being part of a block of k nested loops is very often of the form b-,* mι + b₂ ^* m₂ + ... + b_k* m_k + b_k+ι , where b_s j=1, ...k+1 are integer coefficients and m_s , j=1, ...k are the loop indexes of the nested loops. Memory indexes of the form b-,* m-, + b₂* m₂ + ... + b_k* m + b_k+1 are also called linearized memory indexes. For example, in the above listed program, the linearized memory index corresponding to instance b[m₃][ m₂] would be n * m₃ + m₂. Note that the coefficient n is given by the upper boundary value of loop index m₂. Referring to the before mentioned example of a program source code excerpt written in C and listed below, the linearized 2-dimensional memory indexes corresponding to the 7 instances of the 2-dimensional variable aprfp_∑] , 1≤ iι,i₂ ≤ k , k an integer constant, are k*m+m, k*(m+1)+m, k*(m+1)+m+1, k*(m-1)+m, k*(m-1)+m-1, k*(m+2)+m+1 and k*(m-1)+m-1 respectively.

(2) If the memory/cache is organized as a n-dimensional array, then the memory index corresponding to an instance bpexpr-, ( -,,m₂ ... m^] [expr_∑ ( -,,m₂ ... m^] ... [expr_n (mι,m₂ ... ^] of a n-dimensional variable bpι]p₂]...p_n] appearing in the body of a loop being part of a block of k nested loops, has the form of the n-tuplet [expr-, (m-,, m₂,.. m_k )][ expr₂ (m₁, m₂,.. m_k )] ... [expr_n (m-,, m₂ ,.. m_k )], where expr,- (mi, m₂,.. m_k ) i=1,2...n ,is often of the form b-, ^* m-, + b₂ * m₂ + ... + b_k * m_k + b_k+1 , with b_μ j=1...k+1,l=1...n being integer coefficients. Concerning the before mentioned excerpt of a program source code listed below, the 7 instances of the 2-dimensional variable ap₁]p₂] appearing in the for-loop are : [m][m], [m+1][m], [m+1][m+1], [m-1][mj, [m-1][m-1], [m+2][m+1], [m-1][m-1] .

As can be seen from these two examples, the memory index of an instance of a multidimensional variable appearing in the loop body of a loop being part of a block of nested loops is changing value whenever one or more of the loop indexes on which it depends are changing value. Therefore, the data required to compute the memory index of an instance b[exp^ (mι,m₂ ... mjj] [expr₂ (m^m_∑ ... m^] ... [expr_π (mι,m ... m^] of an n -dimensional variable b appearing in the body of a loop being part of a block of k nested loops are :

(1) the actual values of the loop indexes m_1tm₂ ... m_k corresponding to the iteration counts of the loops being part of said block of nested loops, the iteration counts being given by the execution of said block of loops on said processing device at a given moment in time

(2) the coefficients required to calculate the memory index, e.g. the integer coefficients bj j=1, ...k+1 in the case of a linearized memory index of the form b * m-, + b₂ ^* m₂ + ... + b_k* m_k + b_k+ι

(3) offset, which may be added to a preliminary computed memory index in order to obtain the final memory index. In case that the memory is organized and addressed as a n-dimensional array with the same dimension as the memory index of an instance of an n-dimensional variable, then the offset is n-dimensionai and is of the form [c-,][c₂]... [c_n], c_;- i=1,2...n integer constants. For example, if the instance b[m₁ +3][m₂ +10][m₃7 of a 3-dimensional variable b has to be loaded from memory and has a corresponding offset [2][3][4], then its final memory index is given by preliminary memory index + offset = [m^Hm_∑+IOHma] +[2][3][4] = \m-_i+3+2][m₂+10+3][m₃+4]. It is clear that in case that the memory index is linearized, the offset is just an integer value.

3. Prior Art

Address generation units found in today's microprocessors, CPUs, DSPs and micro-controllers are performing a restricted number of relatively simple address calculation modes. The most important address calculation modes commonly supported are (1) indirect (2) indexed (3) displacement (4) postincrement/decrement (5) modulo (6) modulo wrap-around (circular addressing). However, none of these modes allows to support multidimensional memory indexing efficiently. As a consequence, a lot of instructions in the machine code of a program are required just to calculate the memory indexes of instances of multidimensional variables appearing in the program source code. However with an address generation unit as based on the present invention, all these instructions become obsolete and can be dropped since the memory indexes of instances of multidimensional variables are 'hardwired', in other words they are calculated automatically and autonomously without requiring any instructions in the program code. Furthermore, 'hardwired' does not exclude the possibility to select between several forms of memory indexes (e.g. linearized and others) which are predefined and stored internally in the address generation unit. In this case, control data can be used to tell the address generation unit which form of memory index to select for each instance of a multidimensional variable.

Multidimensional memory indexing naturally occurs in applications involving matrix and vector operations, e.g. FEM calculations, as well as image and multidimensional signal processing. As mentioned before, with a prior art address generation unit, a lot of instructions/operations and hence a lot of program (machine) code size and computation power is required just to perform multidimensional memory indexing. This shall be exemplified by the following excerpt of a program source code written in C (taken from 'Numerical Recipes in C, W.H. Press et. al.') and which determines the eigenvalues of a 2-dimensional upper Hessenberg matrix a, hence requiring 2-dimensional memory indexing. As already mentioned, there are in total 7 2-dimensional memory indexes appearing in the for-loop and

I corresponding to 7 different instances of the 2-dimensional matrix variable a and which have to be recalculated for each iteration of the for-loop body.

Without 2-dimensional memory indexing support, the corresponding program code (in assembler format) of a processing device (microprocessor, CPU, DSP, micro-controller) with a prior art address generation unit and 3-operand instruction set, is listed below and would typically require about 56 instructions, under the assumption that the physical memory is addressed linearly (as a one- dimensional array).

However, by using an address generation unit with 'hardwired' multidimensional memory indexing support as based on the present invention, the program (machine) code (listed behind the prior one) can be reduced to about 33 instructions. This represents a saving of 42 % in program (machine) code size. Furthermore, such an address generation unit with 'hardwired' multidimensional memory indexing support can perform these calculations a lot faster. This allows the program (machine) code to be executed faster, resulting in higher effective processing (computation) power. Note that 7 special instructions, denoted by 'MMU', are required to initialize the address generation unit. Furthermore, executing fewer instructions means also consuming less power. About the improvements are achieved when the physical memory is organized and addressed as a 2-dimensional array.

Therefore, the advantages of address generation units with hardwired multidimensional memory indexing support are threefold : (1) reduced program (machine) code size (2) accelerated program execution (3) reduced power consumption.

The following excerpt of a program source code written in C is taken from 'Numerical Recipes in C, W.H. Press et. al.' and determines the eigenvalues of a 2-dimensional upper Hessenberg matrix a.

for (m=nn-2;m≥1;m- -)

{z=a[m][m]; r= x-z; s= y-z; p= (r*s-w) / a[m+1][m] + a[m][m+1]; q= a[m+1][m+1] -z-r-s; r= a[m+2][m+1]; s= abs(p) + abs(q) + abs(r);

P= p/s; q= q/s; r= r/s; if (m==1) goto 4; u= abs(a[m][m- 1]) *(abs(q) +abs(r)); v=abs(p)*(abs(a[m-1][m-1])+abs(z)+abs(a[m+1][m+1])); if (u+v==v) goto 4;

} 4:

The following program code (in assembler format) of the previous program source code corresponds to a processing device with a conventional address generation unit according the prior art :

1: MACR2,#n,R2 LD R2,(R2) SUB R3,R4,R2 INC R7.R1 MAC R7,#n,R1 LD R7,(R7) INC R8.R1 MACR1,#n,R8 LDR1,(R1) MULT R9,R3,R5 SUB R9.R10 DIV R9.R7 ADD R9.R1 LDR1,#(nn-2) INCR1

MACR1,#n,R1 LDR11.R1 SUBR11.R2 SUB 11.R3 SUBR11,R5 INCR3 ADD R4,#2 MAC R4,#n,R3 LD r4,(r4) ADDR5,|R9|,1R11| ADD R5,R5,|R4| DIVR9.R5 DIVR11.R5 DIVR4.R5 LDR1,#(nn-2) CMPR1,#1 JMPE 56 SUBR12,R1,#1 MACR1,#n,R12 LDR1,(R1) ADDR3,|R11|,|R4| MULTR3,|R1|,R3 DECR1 MACR1,#n,R1 LDR1,(R1) ADDR11,|R1,|R2| LDR1,#(nn-2) INCR1

MACR1,#n,R1 LDR1,(R1) ADDR11,|R1| MULTR11.R9 ADDR3,R11 CMPR3.R11 JMPE 56 LDR1,#(nn-2) JMPER1-,#1 56:

The following program code (in assembler format) of the previous program source code corresponds to a processing system containing a processing device and an address generation unit based on the present invention :

MMU1,R1,#n,R1 MMU2,R1+,#n,R1 MMU3,R1,#n,R1 + MMU4,R1+,#n,R1 + MMU5,R1+#2,#n,R1 + MMU6,R1,#n,R1- MMU7,R1-,#n,R1- 1: SUBR3,R4,A1

SUB R5,R6,A1

MULT R9,R3,R5

SUB R9.R10 DIV R9.A2 ADD R9.A3 SUB R11.A4 SUB R11.R3 SUB R11.R5 ADD R5,|R9|,|R11| ADD R5,R5,|A5| DIV R9.R5 DIV R11.R5 DIV R4,R5 CMP R1 ,#1 JMPE 26

ADD R4,|R4|,|R11| MULT R4,R4,|R11 | ADD R11 ,|R11|,|R12| ADD R11,R12,A4 MULT R11,R11 ,|R9| ADD R4.R11 CMP 4.R11 JMPE 26 : JMPE R1-,#1 26 :

4. Brief description of the drawings

Figure 1 shows a processing system containing a processing device (microprocessor, CPU, DSP, micro-controller) and an address generation unit with 'hardwired' multidimensional memory indexing support as based on the present invention.

5. Detailed description of the drawings

The main aspects of the present invention are described by referring to figure 1 mentioned in this section.

Figure 1 shows the register transfer level architecture of a processing system containing a processing device (microprocessor, CPU, DSP, micro-controller) and an address generation unit with 'hardwired' multidimensional memory indexing support as based on the present invention. In addition, figure 1 shows the data required for initialization and operation of the address generation unit and the data exchanged between the processing device to the address generation unit. Shown is also a memory/ cache from/to which the address generation unit may optionally load/store data according to the addresses as given by the computed memory indexes.

The data required for the initialization and the operation of the address generation unit are now discussed in more detail. The address generation unit is capable to compute a number of predefined forms of memory indexes, e.g. the form corresponding to a linearized memory index (see above for details), which are stored internally and which are selected by control data during initialization of the address generation unit as described below.

1. Given a block of nested loops, with loop indexes m-,,m₂ ... m_k , being part of the source code of a program whose machine code is going to be executed on the processing device and where the loop bodies of said loops may contain any mixture of conditional, branch and jump statements and where said statements may also be nested. Given one or more instances of multidimensional variables appearing in some loop bodies of loops being part of said block of nested loops. The initialization and the operation of the address generation unit are as follows :

2. Before execution of said machine code, the address generation unit is initialized for each instance considered in 1. : a. with control data which select, out of several predefined forms, the form of the memory index of the considered instance to be computed, e.g. the form of a linearized memory index. Remember that the form of a memory index depends on how the physical memory/cache is addressed. b. with the integer coefficients required to calculate the memory index, said memory index having the form that was selected as specified in 2.a, e.g. the coefficients by )-1, ...k+1 in case of a linearized memory index of the form b * m₁ + b₂* m₂ + ... + b_k* m_k + b +ι c. optionally with an offset, which may be multidimensional, and which is added to a preliminary computed memory index of the considered instance to obtain the final memory index where the data mentioned under 2.a-2.c are transmitted from the processing device to the address generation unit, where no further data than those mentioned under 2.a-2.c are required to initialize the address generation unit, where however this does not exclude the possibility that, for practical reasons, additional data may be exchanged between processing device and address generation unit in order to initialize the address generation unit.

3. When the processing device starts executing said machine code, it transmits control data to the address generation unit which tell the address generation unit to start operation ecution of said machine code, the operation of the address generation unit is as follows : d. Whenever a loop index of a loop being part of said block of nested loops changes value (due to execution of said machine code), the new value of said loop index is transmitted from the processing device to the address generation unit such that the latest transmitted values of the loops indexes of said loops being part of said block of nested loops, transmitted from the processing device to the address generation as stated hereunder, represent the so called actual values of the loops indexes of said loops. Note that this transmission scheme, triggered by the change of one or more loop index values, does not exclude the possibility to synchronize through the use of a clock signal the transmission of the values of the loop indexes as well as the calculation of the memory index of the considered instance e. for each instance considered in 1., the address generation unit calculates the corresponding memory index using the data as specified in 2. a - 2.c and based on the actual values of the loop indexes as specified in 4.d. Note that the address generation unit can well calculate said memory index with modified (e.g. incremented) values of the loop indexes. In other words, the address generation unit may well modify, e.g. increment, the actual values of the loop indexes, which are transmitted by the processing device to the address generation unit as specified in 4.d, and calculate a memory index using these modified loop index values. The consequence is that the memory /cache address given by a memory index which is calculated using f. ex. incremented loop index values, contains a value (namely that of the considered instance) which might be required by the program a few iterations later (how many iterations later depends on which loop index values are incremented). Such an memory index (address) calculation ahead of the actual program code execution, together with the subsequent loading of memory/cache data stored at the address given by the calculated memory index, allows to hide the latency (access time) of the memory/cache and to avoid that program execution is slowed down. f. Optionally, whenever the memory index of an instance considered in 1. changes value, the address generation unit either loads from the physical memory/cache address given by the new value of the memory index the value of the considered instance and transmits that value to the processing device or stores to said physical memory/cache address the value of the considered instance and in which case the value of the considered instance was transmitted, prior to be used by the address generation unit, from the processing device to the address generation unit g. Optionally, during operation, the address generation unit calculates the memory index of an instance considered in 1. by using, in addition to the data as specified in 2.a-2.c and 4.d, an offset, which may be multidimensional, which is added to a preliminary computed memory index of said instance to obtain the final memory index, where said offset is computed by the processing device, and transmitted from the processing device to the address generation unit whenever said offset changes value and prior to the usage of the new value of said offset by the address generation unit h. during operation, the address generation unit requires no further data than those mentioned under 2.a - 2.c, 4.d and optionally 4.g to calculate the memory index of an instance considered in 1. . However, this does not exclude the possibility that, for practical reasons, additional data may be exchanged between processing device and address generation unit.

Note that the operation of the address generation unit as well as the before mentioned data required by the address generation unit for the calculation of a memory index includes the following two cases :

(1) one or more instructions/operations to be performed within the loop body of a loop being part of said block of nested loops are executed in parallel (simultaneously), sequentially, or partially sequentially and partially in parallel

(2) the execution (on said processing device) one or more loops being part of said block of nested loops is overlapping, in other words, one or more of the instructions/operations contained in the loop body and which have to be performed during some iteration of one of said loops are executed before all the instructions/operations to be executed during the previous iteration of said loop have been completely executed. In this case, whenever a new iteration of said loop is started and executed, the processing device modifies the value of the loop index of said loop and transmits the new value to the address generation unit.

6. Summary of the invention

The present invention concerns a processing system containing a processing device (microprocessor, DSP, CPU, micro-controller) and an address generation unit with 'hardwired' multidimensional memory indexing support according to claim 1.

Claims

ClaimsWhat is claimed is :

1. A processing system containing a processing device and an address generation unit, where the machine code of a program, whose source code contains a block of nested loops, is going to be executed on said processing device, with one or more instances of multidimensional variables appearing in some loop bodies of loops being part of said block of nested loops, where said address generation unit contains one or more predefined forms of memory indexes which are selectable by control data as described under 1.k and where, before the execution of said machine code, the address generation unit is initialized, for each previously considered instance : i. with control data which select, out of one or more predefined forms, the form of the memory index of said instance to be computed j. with the integer coefficients required to calculate said memory index, said memory index being of the form that was selected as specified in 1.i where no further data than those mentioned under 1.i, 1.j are required to initialize the address generation unit, where the data mentioned under 1.i, 1.j are transmitted from the processing device to the address generation unit, where, upon starting executing of said machine code, the processing device transmits control data to the address generation unit which tell the address generation unit to start operation, where, during execution on the processing device of said machine code, the operation of the address generation unit is as follows : k. whenever a loop index of a loop being part of said block of nested loops changes value, due to execution of said machine code on the processing device, the new value of the considered loop index is transmitted from the processing device to the address generation unit such that the latest transmitted values of the loops indexes of said loops being part of said block of nested loops, transmitted from the processing device to the address generation as stated hereunder, represent the so called actual values of the loops indexes of said loops used to calculate the memory index of each considered instance as stated in n. 1. for each considered instance, the address generation unit calculates the corresponding memory index using the data as specified in 1.i, 1.j and k m. where, during operation, the address generation unit requires no further data than those specified in 1.i, 1.j and 1.k to calculate the memory index of the considered instance where all the memory indexes computed by the address generation unit have at least dimension 2 A processing system containing a processing device and an address generation unit as claimed in claim 1, where the address generation unit is initialized, for each considered instance, and in addition to the data mentioned in Li and 1.j, with an offset which may be multidimensional and which is added to a preliminary computed memory index of the considered instance to obtain the final memory index, and where no further data than those mentioned hereunder are required to initialize the address generation unit

A processing system containing a processing device and an address generation unit as claimed in claim 1 where, during operation, the address generation unit calculates the memory index of a considered instance by using, in addition to the data as specified in 1.i, 1.j and k, an offset, which may be multidimensional, which is added to a preliminary computed memory index of said instance to obtain the final memory index, where said offset is computed by the processing device, and transmitted from the processing device to the address generation unit whenever said offset changes value and prior to the usage of the new value of said offset by the address generation unit, and where during operation, the address generation unit requires no further data than those mentioned hereunder to calculate the memory index of said instance

A processing system containing a processing device and an address generation unit as claimed in claim 1 where, whenever the memory index of a considered instance changes value, the address generation unit either loads from the physical memory/cache address given by the new value of said memory index the value of said instance and transmits that value to the processing device or stores to said physical memory/cache address the value of said instance and in which case the value of said instance was transmitted, prior to be used by the address generation unit, from the processing device to the address generation unit

A processing system containing a processing device and an address generation unit as claimed in claim 1, where all the memory indexes computed by the address generation unit are linearized

A processing system containing a processing device and an address generation unit as claimed in claim 5, where the address generation unit is initialized, for each considered instance, and in addition to the data mentioned in 1.i and 1.j with an offset which may be multidimensional and which is added to a preliminary computed memory index of the considered instance to obtain the final memory index, and where no further data than those mentioned hereunder are required to initialize the address generation unit

A processing system containing a processing device and an address generation unit as claimed in claim 5 where, during operation, the address generation unit calculates the memory index of a considered instance by using, in addition to the data as specified in 1.i, 1.j and 1.k, an offset which may be multidimensional, which is added to a preliminary computed memory index of said instance to obtain the final memory index, where said offset is computed by the processing device, and transmitted from the processing device to the address generation unit whenever said offset changes value and prior to the usage of the new value of said offset by the address generation unit, and where during operation, the address generation unit requires no further data than those mentioned hereunder to calculate the memory index of said instance

A processing system containing a processing device and an address generation unit as claimed in claim 5 where, whenever the memory index of a considered instance changes value, the address generation unit either loads from the physical memory/cache address given by the new value of said memory index the value of said instance and transmits that value to the processing device or stores to said physical memory/cache address the value of said instance and in which case the value of said instance was transmitted, prior to be used by the address generation unit, from the processing device to the address generation unit