CA2086229A1

CA2086229A1 - Number theory mapping generator for addressing matrix structures

Info

Publication number: CA2086229A1
Application number: CA002086229A
Authority: CA
Inventors: Warren Marwood
Original assignee: Australian Government
Current assignee: Luminis Pty Ltd; Australian Government
Priority date: 1990-06-27
Filing date: 1991-06-27
Publication date: 1991-12-28
Also published as: JPH05509426A; US5398322A; WO1992000563A1; EP0536242A4; EP0536242A1

Abstract

This invention relates to the generation of number theory mappings and their application to the addressing of matrix structures through the provision of an address generator which is optimised for the general task of applying "on the fly" number theory mappings to matrix operands as they are fetched from memory. The address generator (10) comprises a set of six matrix descriptor storage registers (11-16), a finite state machine controller (17), a decrementer (18) for cyclically decrementing a matrix size descriptor (12), an additional decrementer (19) for cyclically decrementing the matrix size descriptor (11), a finite difference engine (20) which adds one of two matrix difference descriptors (13 or 14) to a previously calculated address value obtained from the address register (21), a modulo arithmetic computation unit (22) which computes the residue of the finite difference engine output (20) modulo, the matrix modulo descriptor (15) and an adder (23) which adds an offset value stored as the matrix base descriptor (16) to the output of the modulo arithmetic computation unit (22). The output sequence from the base adder (23) is the desired address generator output.

Description

WO 92~00~63 ~ , i3 `-~ PCrlAU9l/0027~

A NUMBER THEORY MAPPING &ENER~TOR - :
FOR .~DDRESSI~G MATRIX STRUCTURES
This invention relates to the generation of number theory mappings and their applica-tion to the addressing of matrix structures in computer systems. The invention provides -particular ad~rantages when used in computer architectures that include systolic proces-sors.
BAC~;GROUND OF TXE INVENTIO~ :
Address generation has been a conti~luing problem for computers, particularly as oper-ation speeds have increased. As the number and range of applications has e~cpanded, there has arisen a need for address generators to produce a range of non-sequential ' address sequences. A common example is the bit- reversed addressing for the FFT.
Common I)igital Signal Processing (DSP) addressing patterns include:
Sequential Inverted Reflected Bit-reversed Perfect shuffled (Interleaved) Multiple shuffled Parallel shuffled These patterns are in common usage as a result of the vector natllre of the cornmon computer architectures. Reference can be made to the paper by ZOBEL~ R.N.~ ;;Some alternative techniques for hardware address generators for digital signal processors`.
ISCAS'88. CH"458-8/88 pp. 69-7'~, 1988, for a description of hardware which imple-ments these addressing patterns. .~nother paper which describes a versatile hard~vare address inde~;ing unit is by i!~W.~CHUI WU, E.O., 'Address generation in a~l array processor", IEEE Trans. on Computers, Vol. C-3~. No. ~, pp. 1/0-1,3. February 198O.
A further paper which in part describes a general purpose address generation technique .. ., . ... ~ . ^ . - .
.. . - . , . , . . - . . ................ , ,. . ~ .

- . ,, . . . , -.,, : . . .. ,. .,. :~ . .. . . .... . . . ..

WO 9V00563 ~ n ~ PCI'/AU91/00272 . .

is the paper by HALL, F.E. and ROCCO ~r,. A.G., "A compact programmable array -processor", The Lincoln Laboratory Journal, Volume 2, Number 1~ 1989.

In all of these papers the address generation techniques are designed to optimise vector- -based algorithms. Matrices and matrix algorithms are supported as operations upon sets of vectors. These approaches have lirnitations when matrix algorithms are implemented which can not readily be expressed in terms of sets of vectors. An example of such an ' algorithm is the one-dimensional Fourier transform implemented with the prime factor -' algorithm. ~`' The object of this invention is the provision of an address generator which is optimised for general matrix algorithms and which is capable of applying on-the-39~y' number theory mappings to matrix operands as they are fetched from memory.
MAPP~G GENERATOR ARC~ITECTURE
A conventional approach to ~he problem of addressing matrices stored in a linear memory space is to consider addressing the elements of the matrix in terms of s~rsdes. The strides specify the linear distance between successive elements of rows or columns respectively.
The problem with this approach is that it is not possible to both fetch matrix operands and simultaneously apply general number theory mappings. The mappings must be applied to the matrices as separate operations. These operations must be done insoftware in a conventional machine and incur significant time penalties. ~, Examination of conventional matrix storage schemes shows that they can be considered as simple mappings between one-dimensional and multi-dimensional subspaces. Address generation for the matrices can ~herefore be performed bv carr,ving out a particular mapping from one dimension to or from two or more dimensions. This can be provided by constructing a hardware implementation of a general number theor mapping. Thehardware must pronde a general capability to support mappings from one dimensionto multi- dimensional subspaces.

The solution to the problem is therefore to replace the conventional address genera-,........ .. . . . . . .

. . .

- . , ' ~ :

~i~J J_~iVjo_ ?~;.

tor with a hardware architecture which implements a general number theory mapping, which unlike prior soft~vare for the implementation of particular linear trallsform algo-rithms is generally applicable to a range of problems which includes but is not lirnited to linear transforms.
SUMMARY OF THE INVENTIO~
In its broadest from the invention is an address generator to generate at its output, output addresses of elements of an N-dimensional matrix comprising storage means for storing at least four descriptors representative of a matrix stnlcture, wherein, at least two size descriptors representative of the size of two dimensions of an n-dimensional matrix and at least a further two of said descriptors are difference descriptors representative of the values of two finite address differences, a counter means per size descriptor, a sequential l~T-dimensional finite difference calculation means having as input said dif-ference descriptors and an initially zero previously calculated output address from the address generator whereby said counter means controls the finite difference used by the calculation means to calculate a calculation means output address which is also the address generator output address of an element of an I~-dimensional matrix.

In a further aspect of the invention the address generator further comprises the use of a modulo descriptor representative of an address value which is greater than all matrix element addresses, subtraction means for subtraction from said calculation means output address said modulo descriptor to produce a modulo subtraction means result and a subtrac~ionmeans sign output, n ~0 ~21(~ oj ~ UY~

selection means to select as the output of said address generator either of the calcula-tion means output address or the subtraction means output address according to the subtraction means sign output.

BRIEF DESCRIPTIO~ OF THE DRAWINGS
Figure 1 depicts a schematic representation of a preferred number theory mappingaddress geslerator.

Figure 2 is a software simuiation of the number theory mapping address generatorwritten in the 'C' programming language.

Figure 3 is an e~ctract from 'C' code which implements the generalised T-transform.

Figure 4 is 'C' code which generates a IIartley transform from a T transform.

DETAILED DESCRIPTION OF THE DRAWINGS
The mapping which has been chosen for description by embodirnent is known as the lters~ate integer representation. It has been used to implement a generalised dimensional factorization of the T-transform.
Mappings which have also been considered include the Alternate Integer Representation, the Chinese Remainder Theorem and other simple mappings. .

From one dimension n to two dimensions (n1, n2), the mappings are:

The Alternate Integer Representation is n = (nllV, Jr n~,.VI),~
where Nl and lV2 are mutually prime and ~VI ~ = V.

The Chinese Remainder Theorem is n = ( n 1 I~J2 (~\ 2 ) N, + n2 ~VI (-~1 ) N~ ) N ( - ) ? '~; .~ u 9 ~ / n~

where lV, and N2 are mutually prime, N, N2 = N and (a)N(a~

The notation (a)N means that a is evaluated m odulo N.

The Simple Mapping used conventionally to store matricies in linear memory is n = nl l~2 + n2 Examination of equations (1) to (3) shows that each can be implemented with a second order difl~erence e~ e implemented with a modulo arithmetic capability as described in the following embodiment and mapping examples, provided that the constants are chosen appropriately. Consider the following expression:
.

n = base address + (nlA l t n2~2) (4) This maps an element n of an arbitrary matrix [Al stored in a linear address space starting at ba~e addres~ onto the (n"n2) element of a two-dimensional address space.
(.)q normally requires a division operalion. However, by performing a conditional sub-traction of q during each calculation the modulo arithmetic can be performed without the complexity of muitiplication or division.

To address sequentially all elements of the matrix [Al in some order determined b~ the constants ~1, A2 and ~, nl and n2 are indexed through their respective ranges (the dimension of the matrix). Some of the choices for these constants are given in table 1.
asld the associated mapping is identified.

.

iV ~_~ V~_OJ ~ A~?~

l~lapping A 1 ~2 Chinese Remainder Theorem ~2 (N2 1 )Nl Nl (Nl ' )~2 NI ~V2 Alternate Index Representation lV2 N~ Nl lV2 :
Simple(1) lV2 Nl ma~int S;mple(2) N2 1 ma2int Simple(3) 1 Nl ma:~ int.
Simple(4) 1 K ma ~ int TABLE 1 `
Table 1: Values for ~ and ~ which implement three di~erent mappings from a one-dimensional to a two-dimensional space.
where maT_int is the maximum integer of the nurnber representation used. This removes the use of modulo anthmetic. It must be noted that the mappings are not restricted to one-to-one for some parameters.

Some matrix types for which this addressing technique provides access include ~ Dense matrices Diagonal matrices Circulant matrices (e.g. the unit matrix) Constant matrices Number Theory mapped matnces MAPPING GENERATOR IMPLEMENTATION -Figure 1 shows a blocl; schematic of a circuit which implements the difference engine of equation (4). 'C' code which simulates the generator is given in Figure '>. The address generator assumes a two-phase implementation using a Programmable Logic Array (PLA) to generate the necessary control signals.

The address generator 10 of Figure 1 comprises in this embodiment a set of six matrix descriptor storage registers 11-16, a finite state machine controller 17, a decrementer 18 for cyclically decrementing a matrix size descriptor 12, an additional decrementer , wo 92/00563 ~ PCr/AU9l/00272 19 for cycliclly decrementing the matrix size descriptor 11, a finite diÆerence engine ?O
which adds one of two matrix difference descriptors 13 or 14 to a previously calculated address value obtained from the address register '~1, a modulo arithmetic computation unit 2'~ which computes the residue of the finite difference engine output 20 modulo the matrix modulo desrciptor 15 and an adder "3 which adds an offset value stored as the matrix base descriptor 16 to the output of the modulo arithmetic computation ~it ~".
The output sequence from the base adder ~3 is the desired address generator output.

The finite difference engine 20 has inputs from matrix difference descriptors 13 and 14 which represent difference values which are to be conditionally added to the contents of address register 21. The initial state causes a zero address to be computed by the difference engine by directing to the inputs of the adder 24 through '~:1 multiplexer 25 the ones complement of the matrix differe~ce descriptor 14, and through '~:l multiplexer 26 the unchanged matrix difference descriptor 14 and a non-zero carry value supplied by the fi~ite state machine 17. For subsequent states multiplexer 25 directs the contents of address register 21 to the adder 24, and multiplexer 6 selects as a function of its control input either matrix difference descriptor 14 or matrix difference descriptor 15.
The finite state machine 17 controls both multiplexers 25 and ~6 in their various states and provides a zero carry value to the adder 24 in all states other than the initial state. For matrices of higher dimension than two the 2:1 multiplexer 26 is replaced by a multiplexer with a larger number of inputs.

The modulo arithmetic computation unit '~" comprises an adder '~ / having as inputs che ones complement of the matrix modulo descriptor 15, the output of the finite difference engine and a logic high carry whose output is the difference between the finite difference engine output address and the matrix modulo descriptor value. The unit '~" furtner comprises a multiplexer 28 which is controlled by the sign of the adder 27 result to pass either the unchanged output of the finite difference engine ~0 or the output of the adder ~7 to its output. The output of the multiplexer _8 is supplied to the address register '71 for use in the computation of the next address. The address supplied by the multiplexer is a -~lalid address sequence for the matrix structure.

- ~
- . . .. .

~v~ n~ PC~/AU91 /On2 1 ~

The address sequence obtained from unit ~" can be adapted for use in a computer svstem bv offsetting the values bv a constant which is stored as a matri~; base descriptor 16.
and which is added to the output of unit 2" in a base adder 23.

The cyclic decrementer 18 comprises a serially arranged configuration of a multiplexer 29, a register 30 and a register decrementer/test-means 31. The finite state machine 17 initialises the decrementer 31 to zero in the initial state and consequently causes the test-means to cause the multiplexer '79 to pass the matrix size descriptor i'~ to the register 30. In subsequent states the contents of the register 30 is decremented in register decrementer 31 and tested for the zero value. A non-zero value causes the multiplexer 29 to store the register decrementer output in the register 30. A zero value causes the multiplexer 29 to pass the matrix size desc~iptor 1'~ to the register 30. This description is appLicable to each of the plurality of counters required for an n-dimensional generator.

s1 sO j run zf~ rldl rld2 Icin 1 51 50 O O O X X O O O O O ., O 1 X , X X O O O 1 O ..

1 O x 1 O 1 O O 1 O
1_ O X 1 1 1 1 O O O

TABLE '~: State table for one embodiment of a two-dimensional address genera~or PLA.
The state table for the address generator is shown in table ". Variables are:
50,sl: stateaddresslines run: start flag f1, -f'~: zero flags from decrementers rldl, rld~: reload signals for the decrementers cin: carry input to first adder.

: , . . .:

wo 92/nos63 ~ ~ PCr/AU91/0027~
9 .
Equauons which describe the outputs are:

sO = cin =--sO.--sl.run sl = sO.-sl + -~0.sl.-zlf.-z"f + -s0.s1.-lf.-z~f rldl = -s0.s1. lf.-z2f + ~sO.~l.zlf.z2f rld" = -sO.sl.~lf.z'>f AREA AND TIME CONSIDERATIONS
It is assumed that in the schematic of Figure 1, all computations are done in a two-phase clocked system, with the address calculation being perforrned with combinatorial logic.
Further, it is assumed that the circuits are implemented in a CMOS technology and use the simplest possible ripple-casry addition circuits. Using these assumptions there are estimated to be in excess of 5000 transistors in the multiplexer/inversion/addition datapath for a 32-bit implementation, and less than 10000 transistors in the complete generator.

Consider a technology in which the slowest ripple-carry adder has an execution time per bit of 2 ns. The output from the second adder will follow the output from the first by about 3 ns, and will occur about 70 ns after phase ~ is active. This timing determines the start time of the third additiorl, which requires a further 70 ns including the multiplexer delay. Thus the generator can execute in approximately 200 ns, allowing 50 ns for a phasel cloc~; cycle. In an embodiment of the invention only about 10 ns would be needed for the phase 1 clocl; period.
~, ~
Addi~ional registers can be added to the circuit to reduce the number of adders.
Multiplexer-based ~vlanchester carry circuits can execute at less than 1 ns per bit in current processes. These circuits allow the multiplexing of a single adder to perforrn the three additions at the expense of some additional registers and control circuitr~,.
Benefits of this approach are that the operands are not required at the same time and so a RAM register-file can be used tO minimise the area of the data registers.

, WO 92inos63 ~ , . PCI'/AU91/10~7' V . ~ ' V ~-:~;
If faster address generation is required the adder architecture can be replaced with faster structures. In this case a trade off of area for e~cecution time is performed. To achieve maximum performance the mapping generator can be pipelined, and it is estimated that with existing processes it is reasonable to e:cpect addresses to be generated at a rate that exceeds 50MHz.
The use of number theory mapping hardware implemented as a difference engine pro-vides access to both normal and transposed matrix structures including constant ma-trices stored as a single scalar. Circulant matrices are generated from a single row. The generality of the approach makes possible the use of prime factor mappings of dense -`
matrices without time penalty. The prime factor mappings are used to optimise perfor-mance when executing algorithms such as convolutions, correlations and a large number of linear transforms. Examples of these transforms include the Fourier transform, the Chirp-z transform and the Hartley transform.

The technique is elegant as the number theory mappings which were originally additional operations are implemented without time penalty. In addition, as the conventional stor-age schemes for matrices appear as a subset of the mapping capability of the generalor, the need for conventional adddress generation hardware is removed.

MATRIX ADDRESSING EXAMPLES
Arguments which are required by the generator of this embodiment are the set {ba3e.
leltal, delta2, nl. n2, q}. The following examples consider 3 x a. a x 3 and a x a matrices, and show the address sequences which are used for normal, transposed, prime factor mapped and circulant matrices. The examples shown are obtained by execuling the simulator provided in Figure ~.

1. Normal Form A 3 x a matrix stored in row order requires a simple linear sequence.

Enter ba~e,deltal.delta''.nl.n_.q . ~, ., WO 92J00563 ~ pCl`/AU91/00272 11 !``

O 1 7 3 ~ 5 6 7 3 9 10 11 1'~ 13 14 A non-zero base address simply: Lsets the address sequence, e.g.

Enter base,deltal,delta2,nl,n2,q 100 101 102 103 104 105 106 107 108 109 110 111 11'~ 113 114 2. Transposed Form Transposition of the above matrix requires the following arguments.

Enter base,deltal,delta2,nl,rl2,q 0510 1611 2712 3813 g914 This is a multiple~shuffle of the sequential addresses 3. Prime factor mappings i:

A prime factor mapping of the matrix is given by the following:

Enter base,deltal,delta2,nl,n2,q .0385315 0 3 6 9 1'~ 5 ~ 11 14 '7 10 13 1 4 ( 4. Transposed prime factor mappi~gs Transposition of the above mapped matrix is obtained b~:

Enter base.deltal,delta~,nl n2,q 0 5 ~ 3 5 15 Wo 92/nnS63 PCI'/AU91/0027 1"
o 5 lo 3 ~ 13 6 11 1 9 5. Circulant matrices For matrices which are circulant, major savings in both storage and generation time are possible by computing only the first row, and generating the required matrix from this one row. As an example the generator is used to generate a 3 x 5 matrix from a single ~-element array.

Enter base,deltal,delta2,nl,n2,q 01234 40123 340 I '~
This is the technique used to generate the unit matrix I from a single row with a one in the first element position followed by N--1 zeroes (for an order lV matrix).

A skew-circulant matrix is generated similarly:

Enter base,deltal,delt~,nl,n2,q 01234 1 ? 340 23401 Constant Matrices Where an algorithm calls for the multiplication of a matrix by a scalar constant~ e.g.
C = aA, it is readily implemented by a Hadamard multiplication of the matrix by an identically dimensioned matrix whose elements are the desired constant. This is readily achieved by generating a single scalar and then choosing para ne~ers for the mapping hardware which construct the constant matrix from the one scalar. i.e. for a scalar at address 0:
Enter base.deltal,delta".n1,n2,q . .. , . -- - . , .. - .. . .. ,, . : . -, .. . .. .. . .
- , ~ . .
.

WO 921('0s63 ~ PCI/AU9l/0027' 13 ' Sub-matri~c generation Sub-matrices are extracted from an arbitrary matrix with appropriate offsets, e.g. the 2 x 2 sub-matrix of the matrix in the first example, starting at element a2,2.

Enter base.deltal,delta2.nl,n2.q 6 1 4'~ 15 67 111"
Skewed sub-matrices are extracted similarly.

A MULTI-DIMENSIONAL TRANSFORM EXAMPLE ~, The T transform is a linear transform which uses as its kernel the real function cas~ =
cos~ + sin~ defined by Hartley in 1942. The discrete transform is defined as a matrix vector product. Alternate integer representation rnappings can be used to re-write one-dimensional T trànsforms whose lengths are factorable into co-prime integers as multi-dimensional transforms. These transforms can be implemented as sets of matrix products. The T transform can be used as a faster algorithm for computing both the Hàrtley and Fourier transforms for real data.

Hartley, in 194'7 [1], defined an alternati~e real kernel for the Fourier in~egrals which led to the following transform pair discussed in detail in ['~]: ;
N(f) = ~,~ J ~(t)cas(_, ft)dt (;~

X(t) = ~ J H(f)cas(_,. ft)df ~6j where cas(2" ft ) = cos(~7rft) + sin(2~ft). X(t) is a real function of time. and H(f) is WO 92/OQ~63 ~ ` PCl`/AU91/0027'' ;~ _ V ! V . J ~J

a real function of frequency.

These integrals are the continuous Hartley transforms. For sampled data systems the Discrete Hartley Transform (DHT) pair can be written as . . .

.N--I i~, H(k) = ~ I(n)Cas(21rnk/I~) (7) ~(n) = ~ ~ H(~)cas(27rnk/N) (~) Equations ( I ) and (8) are best considered as matrix-vector products.

Whe~ number theory mappings are applied to the matrix-vector products which define the Hartley trans~orm the mappings con~rert the one-dimensional matrix-vector product, or convolution~ to a multi-dimensional convolution which can be implemented as a series of mat~x-matrix multiplications. These higher dimensional transforms are not readily expressed in terms of the kernel.

A related transform which is defined expressly in terms of the cas kernel is the T
transform. The one-dimensional T-transform is identical to the Hartley transform, and is defined by equation (7).

THE MULTI-DI~IENSIONAL PRIME FACTOR ~ TRANSFORM
Consider the mapping of the linear input and output vectors ~(n) and X(k) into p-dimensional fo~ns using the alternate integer representation maps .
~; n = (~ Njni) (9) =l lsj<P.j~i N

w0 92/0056~ . ? ~ PCI /AlJ91 /0027 ' , k = (~ ~ Njlc,) ` (10) ~=1 I<j<p,j~ .

and the product is given by:

nk = ~ Ni2niki) (11) i=1 ISiSP.i~i .~V `

where the length N of the vectors is factorable into p co-prime factors N~ . . . Np, and P ,,.
N = II Ni-Substitution of this product into the DHT equation (7) con~erts the one-dimensional Hartley transform into a multi-dimensional transform. However, this multi-dimensional form is not a simple extension of the cas kernel.

The T multi-dimenslonal transform is written directly in terms of the kernel, and for the p-dimension~l case is written simply as :
..
N1--l N2--l Np--1 T(kl,k2,-- ,k3) =N ~ (nl,n2, ,np) nl=O n~=O np=O
27rNnl kl ~ ')7rNn ~k2 ~ ~ 2" Nnpkt, '\
Ni2 cas ~ l~T2 ) cas ~ Np2 J ( :' -The inverse transform is simply ~ N,--I N2--lN p--1 tl ~ n2~ - -, np ) = ~ , - -, kp ) nl =0 n2=O np=O
~ n~ ~n2 k~,~ ~,.1Vnp1~p cas ~,t" ) cas ~ ~ ~ ) - cas r~ ( 13) ~ .
~ .

WO 92~mlS63 ? '`~ ? n ~? ~ PCS/AU91/00272 16 ';
This transform is of interest for the following reasons:

it is readily computed with a recursive procedure ` ;`

filtering can be done in the T domain the Hartley transform for 2, 3 and 4 dimensions can be readily derived from it.

the real Fourier transforrn for ~, 3 and 4 dimensions is derivable directly from the T transform.
Although Fourier and Hartley transforms of higher dimension than four can be computed with the T transform, it is likely to be less computationally efficient than the direct computation.

The following presents the relationship between the T and Hartley transforms.

The identity 2cas(a + b) =cas(a)cas(b) + cas(a)cas(--b) I -cas(--a)cas(b)--cas(--a)cas(--b) (14) can be used with the two-dimension~l T transform to compute the Hartley transform.
Let T(kl, k2) be the two-dimension~l T transforrn. Then the two-dimensional Hartlev transform is given by ~H( ~ ) =T( ~l, k2 ) + T( Nl - k3 ~ ~ 2 ) + T( 1.~ V - k~, ) -T(NI--kl, lV2--k2 ) ( 1;)) For the three-dimensional case. the following identity is used:

2cas(a + b + cj =cas(--a)cas(b)cas(c) + cas(a)cas(--b)cas(c), cas(a)cas(b)cas(-c) - cas(--a)cas(--b)cas(-c) (16) ., .. ~ ~ . ...

,. . ... .

o 92/nos63 ~ PCr/AU91/00272 Let T(~ 2,k3) be the three-dimensional T transform. Then the three-dimensional Hartley transform is given by - -2H(kl,k2,k3) =T(NI--kl~k2~k3) + T(kl~N2 - k2,k3)+
T(kl, k2, N3 - k3)--T(Nl - k1, N2--k2. N3 - k3) (17) Example code for the computa~ion of the ~dimensional T-transform is given in Fig-ure 3. .~dditional code which generates the Hartley transform from a two-dimensional T-transform is given in Figure 4. The T-transform is computed directly with two matrix ` `
products in this example to minimise overheads. For higher dimensional factorizations than two, the generalised algorithm would be used, and a tensor add real function used to implemeDt the sums as reqt~ired. ,~i .,., . ~ ~ -- , . , ~, .. . . , : , . - .

. , :., .: . ,. , . :,- ,.: ,- , .. . . . . .

wo gv00563 ~ 2 ~ pcr/Aust/oo2n #include <stdio.h~
;~include ~math.h>
void pla(clocl~,sO~sl~rlln~nlz~z~rldnl~rld~cin) int clocl~,~sO,*sl,run,$nlz,*~2z,*rldn1,*rld~ cin;
static int tO,tl,t",t3,t4;
if (cloc~ == O) {
tO = (~*sO & -*sl & n~)&l; lO
tl = (*sO & -*sl)~l;
t'~ = (~*sO ~ ~sl ~ ~*nlz & -*n~z)&l;
t3 = (~*sû ~ ~sl & *lllz ~ -*~ z)~
t4 = (~*sO ~ *sl ~ *nlz ~ *n~z)&l;
} - ..
el~e {
*sO = tO;
*sl = tllt21t3;
*rldn~ = t4;
*rldnl = t31t4; 20 *C2~1 = *SO;

}
~oid dec(reg,zf) int *reg,*z~.;
*zf = (--(*reg)==O)i /~ pr:ntfif"rcg,zf = ~d ~ \n",~rcg,~zf); ~/
} 30 void pri~t_states(cloc~,sl,sO,ruD,zfl,æ,rldnl,rld~,carryin) int cloc~;,sl,sO,run,zfl,zf~,rld~l,rlt~,carr rin;
if (cloc~c==l){
printf("sl,sO,ruD,zfl,zf2,rld~1,rld32,ca~yin~
printf(~X2d Z2d ~/.2d 'b2d ~/.2d X.2d X2d X2dkL", sl.sO,run,zfl,æ,rldIll,rld~,carryi~);
}

} 40 int m~LY(sel,a,b) int sel,a,b;
if (sel--=O) return(a);

SUBSTIl ~JT3~ SHEE

. , . . , ~ . . .. .. . . . .. .. .. . . .

wo92/00s63 ~ iJ'~ 3`.J~ Pcr/Au9l/oon2 else return(b);
}
int add(a,b,c) -.
int a,b,c; 50 retur~(a~b+c);
} ~ .
.roit mai~l() int clocl;.sO,s 1 ,mn,rldnl,rld~a~,ca~Tvin;
int nl, r~. q, regl, reg_, zfl, z~;
int add_reg ol, add_reg o2, deltal, delta_, base; .
int muxO, muxl, m~, addO, addl, add": 60 ;un = l; sO = O; sl = O; rldnl = O; rldn_ = O; carr~n=O; zn = O; z = O;
pnntf("E~ter ba~e,deltal,delta2,~1,n2,a\nl');
.sca~f("Xd Zd 'I.d 'I.d ~/.d ~/,d\~ base,~deltal,~delt~'~,~Ll"~ q);
printf(~/.d 'bd Xd '~.d 'bd Z.d\~",base,deital,delta'~,nl,~2,q); `.
regl = nl, reg"--rl2;
while (/~un~30~tl~/z~+æ !='~){
for (cloc~;=O;clocl;<~;cloc~--+) { s o /~ rcg~tcr opcratio~ - reload de~rement rcgi~ter~ if zcro ~/
if (cloc~c==O) {
if (zn==l) regl=nl;
if (~fT'==l) reg"=~
add reg ol = ~m~LY'~;
if (cloc~;==l) {
dec(.~rregl,.~z~l);
if (zs~ ) dec(.~reg'~.~zs~); 80 add reg o~ = ~add re~
}

/# reset run f~ag ~/
if (sû==l) rln = û;
/# generate control 3ignala J~/
pla(cloc~.~sO,.~rsl,ruD,~zfl,~zf~ rldnl,&rldn~,&ca~rri l);
/~ Pha~e~ operation~ ~/ so ~ .
SUgSTlTUTE S~tE~ i :

WO 92/00563 ' ~3 ~ `~3 i~. J ~J PCI'/AU91/00272 if (cloc~==1) {
muxO = mux(sO,add re~p~~deltal);
muxl = mux(rldnl.deltal,delt~");
addO = add(muxO,muxl,carryin);
addl = add(addO,~q,l);
mux2 = mux(addl<O,addl,addO);
add2 = add(base,mux~,O~;
printf(~'bd ~,add2);
if (zfl==l) printf(~
} 100 /~ pr:nt_3tatee,(clocfc,31,30 run, fl,_f2.rldnl rldn2,ca7~yin); ~/
}
}
printf( ~
}

. _ . .

I SV6STITUTE SHE~T

- , ~, , . , .. - . ..... .. .
..... . .... .... . ~ . ............ - ; ` .. ..
.~ , ~ .. .. . ,,. . . ; .............. ~ ,............. ..

w ~ 9V~0563 ~ PCT/AU91/00272 FIGUFLE 3 'C' code w~ch implements the genesa~ed T-tran~fonn Xinclud Q < sttio h>
~include <~ath h>
/#~define TIME */
#define SIM
t7pede. .loat precision;
typedef struct {
int i~it; /* ;n;tial off~t ~/
int ~ no of cols ~/
n~ n2; /~ ~ of roYs */
int dl; /~ roY elemcnt spacing ~/
int d2; /~ last roY element to first nest-ro~ element */
int modulo; /* mod~lo */
int nega~e;
precision # boty; /* data pointer */
~ MATRIX;
typedef st~c- {
int p, *n, *N;
MAT~IX *~
} cas_nd_coef;
~define ex 5 #define 4 _ CO O
#define 4_5i 1 Xdefine f_ha 2 Xdefine f_ha_ 3 #define addr a ~oid put(A,i,j,x) MATRIX ~A; int i,j; precision x;
{
int address;
address = A->init~(i*((A->~l-l)*A->dl+A->d2)lj#A->dl);
if (A->modulo) address = address~/A->modulo;
A->body~address~ = x;
} ~.
MATRIX ~ init_har~(func, nl, n) int func, ~l, n;
{

int i, j, k; precision ar~;
MATRIX ~x;
x = define_matrix((precision *) 0,O,l,l,n,n,n~n,0);
for(i=0; i<n; il+){

S~U~ ~h~FT
.. __ . _ , _ _, .. .... .. .

, .

.
.

w~ n~ PCI'/AU91 /002n 'J ,i' ~or(j-0; j<~ ){ `
~-g - 2.0*M_PI~(nl*i~j'l~)/n;
k ~
~itch (func){
case f_co:
put~,i,j,cos(arg));
br~ak;
case 4_si: -put(~,i,j,sin(arg));
break;
case f_ha:
put(x,i,j,cos(arg)+sin(~g));
'oreak;
case ~_ha_:
put(~,i,j,cos(-ar~) I sin(-arg));
break;
case addr:
put(z,i,j,(precision) i*x->nl+j);
break;
case I:
put(s,i,j,(precision) k);
br~ak;

}
}
retu~
}
cas_~d_coef * pfnd_cas_coef(p,n) i~t p, ~;
{

cas_nd_coe4 *A;
int i,N;
A = (cas_~d_coef *) malloc(sizeof(cas_~d_coef));
A->~ = (MAT~IX **~ malloc(p*sizeof(MATRIX));
A->~ = (int *) malloc(p*sizeof(int));
A->N = (i~t ~) m~lloc(p~sizeof(int));
A->p = p;
N - l;
for (i=O;i<p;i++) N *= ~ti~;
for(i=O;i<p;i~+){
A->~ti~ = ~ti];
A->Nti] = N/nti~;
A~ i] ~ (MAT~IX *) malloc(sizeof(MAT~IX));
A->~Ci] = init_hart(f_ha,A->N~i],~ti~);

l Su æST~ T~ c~ ~T

. . - . ,-.' .. ,. . `, ... , . , .
,. .. . , ,, .,. , . : . : . , . , :

w O 92/00563 ~ PCTJAU91/0027 retur3 A;
}
void map(A,offset,Nl,N2) MAT~IX *A; int offset, Nl, N2;
A-~init ~ offs~t;
A->dl = Nl;
A->d2 = N2 - (A->~l - l)*Nl;
}
MAT~IX * P+~ap(A, offset, Nl, N2) MATRIX ~A; int o~fs~t, Nl, N2;
{

map(A, o~fsQt, Nl, N2); -, ~eturl A;
}

~a id Tensor_M~7tiply_real(le~el,p,init,A,X) int level, p, i it;
cas_~d_coef ~A;
MAT~IX *~X;
int i,l,inte~,P,P_l,P_2,*n,*N;
P ~ A->p;
P_l = A->p-l;
P_2 = A->p-2 if ~l~vel < P_2) {
index = (levellp)~/.P;
~or (i=0; i<A->n~index]; i++) {
if (i) init += A->N~index];
Tensor_Multiply_real(level+l,p,i~it,A,X);
}
}
else {
n = A->~; N = A->N;
in it = init 'I, (N [0] ~ ~0] );
X ~ 1] ->~2=X ~l] ->n2 = X ~0~ ->~2 = X to] ->n 2 = n~(?_2+p)%P~;
X~l]->~l=XCl]->~l = X~0]->~l = X~0]->nl = ~(P_l+p)'t.P~;
if(!init) {
X~p&l] = Pfmap(X~p&l],init, N~(P_l+p)~/,P], N~(P_2~p3~/R]);
X~ p)~l~ = Pfmap(X~(l+p)~l~,init,N~(P_l~p)'t.P],N~(P_2~p)'bP]);

~i~

: . . . . .

. : . :. . . - . , . ~ .
.~ . ~ - . .. . .

w~ ~!n~Sfi~ ?,`~ 3 2 ~ PCT/AU91/00272 X~0]->init=XtO]->init-Xtl]->init-~l]->init=i~it;
X~ p)~l]=~mult3(A->~(P_2~p)~/.P~,X[p~l~,X~(p+l)~l]);
}
} ~ .
preci.s~on *
p~ca~_nd(A, x) cas_nd_coe~ ~A;
precision *x;
{
MAT~I~ *X~2~;
precision ~y;
int p,~N;
~ - A->n~O]~A->N~0];
/* Define temporary storage matrices */
x ~oi = de~ine_matrix(x,O,l,l,A->n~a~,A->n~ N,0);
Xtl] z def~ne_mat~ix~(prec~s~on *) 0,O,l,l,A->n~O~,A->n~ ,0);
/~ Perfor~ the p-dimension 1 T transform */
for (p=0; p~A->p; p~+) Tensor_Multiply_re 1(0,p,0,A,X);
/~ Assig~ the result ~ector */
~ z X~A->p%2~->body;
/* Free the temporary storage a~d re~ur~ the trans~or~
free(X~(A->pll)X,2~->body);
free(X~0]);
free(X~']);
etu-n the transTorm ~/
retu~l(y);
}

SUBSTI ~ UTE SHEET

... . . . .....

WO 92/00563 ~ PCT/AU91/00272 FIGUFUE 4:'C' code w ~ch ge~erates a Haltley transfonn ~ODI ~ onn.
MATRIX # mod_index_rev(x,u,v) MATRIX *x; int u,v;
/* This function maps a matrix T(u,v) to the matris T((N-~) 't. N,(M-~) X

a a function of the input argume~ts u ~d v. If u or v i~ true (fal~e), the intex is mapped (unmapped). True i9 1 ~nd false is 0.
~/ .
{

MATRIX *A;
if (!(A = (MA~IV~ *) m~lloc(sizeof(~AT~I~)))) pri~tf( "Memory allocation for MAT~IX struct failed in define_matri~\n");
A->i~it . s->iuit;
A->~l - x->nl;
A->n2 - s->~2;
s~itch (2*v~) {
case(O): A->dl = x-~dl;
A->d2 ~ s->d2;
break;
case(l): A->dl ~ s->dl;
A->d2 ~ (x->dl*(~->ul-1)+x->d2)~(~->n2-l)~A->dl*(~-~l-l);
break;
case(2): A->dl ~ x->dl*(s->nl-l);
A->d2 - A->dlls->d2-A->dl#(s->n~-l);
break;
case(3): A->dl ~ x-~dl*(s->ul-l);
A->d2 = (x->dl~(x->nl-l)+s->d2)*(s->~2-1)-A->dl*(~->nl-l);
break;
}

A->modulo = x->modulo;
A->negate = x->~egate;
-A->body = x->body;
return A;
}

precision ~ pfcascasO(A,x) cas_coef #.4; precision *x;
{
MATRIX *a, *T, *c;
precision *y;
int nl, n2, i;
MATRI~ *T10, *TOl, *Tll, *t;

~=~
. - ~-- .- ' ' '' ' :
. . .
~ ' w ~ s~/no~ PCT/AU91/00272 ..

Initialise matri- dimen~ion~ u3ing th~ coe~ficie~t matrix di~e~ i~n~
*/
~1 ~ A->ha->nl; n2 ~ A->ha_->nl;
/* Define ~ome temporary storage matrices ~/
a = define_~at_is(x,O,l,l,n2,nl,~1*n2,0);
T ~ define_matsis((precidon ~) 0,0,1,1,~2,~ Q2,0);
/* Perform the T transform */
T = (mmult3(m~ult2(A->ha,Pfmap(a,0,~ 2)),A->ha_,P~3ap(T,O,nl,~2)));
y = T->body;
for (i=O;i<n~*n2;i++) y~l] = y~i]/sqrt((precision) n1*~2)/2.0;
T10 = mod_index_rer(T,l,0);
T01 - mod_i~dex_rer(T,0,1);
Tll = mod_indes_rev(T,l,l); ~ ::
/# Generate the ~artley matris from the T matrix */
t 3 madd2(TlO,T01);
msub3(t,Tll,t);
madd3(T,t,T); .
/# Free the temporary storage and retur~ t~e transform */
Frse_Matsix(t);
F~ee_Mat_ix(c);
free((char *) T);
~ree((char *) a);
free((char ~) T10);
free((char *) T01);
~ree((char *) Tll~;
return y;
}

. ~,V~Sl-IT~TE SHEET
e

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1. An address generator to generate at its output, output addresses of elements of an N-dimensional matrix comprising storage means for storing at least four descriptors representative of a matrix struc-ture, wherein, at least two size descriptors representative of the size of two dimensions of an n-dimensional matrix and at least a further two of said descriptors are difference descriptors represen-tative of the values of two finite address differences, a counter means per size descriptor, a sequential N-dimensional finite difference calculation means having as input said difference descriptors and an initially zero previously calculated output ad-dress from the address generator whereby said counter means controls the finite difference used by the calculation means to calculate a calculation means outputaddress which is, also the address generator output address of an element of an N-dimensional matrix.

2. An address generator according to claim 1 further comprising a modulo descrip-tor representative of an address value which is greater than all matrix element addresses, subtraction means for subtraction from said calculation means output address said modulo descriptor to produce a modulo subtraction means result and a subtractionmeans sign output, selection means to select as the output of said address generator either of the cal-culation means output address or the subtraction means output address according to the subtraction means sign output.

3. An address generator according to claim 2 wherein said selection means selects as the output of said address generator said calculation means output address ifsaid subtraction means sign output is negative or said subtraction means output address if said subtraction means sign output is positive.

4. An address generator according to claim 3 further comprising a base descriptor representative of an offset value, an addition means to add said base descriptor to said selection means output to provide an offset address as the address generator output address generator output address of an element of an N-dimensional matrix.

5. An address generator according to claim 1 wherein a respective said counter means further comprises, a selection means having first, second and at least one control input, said first input being a size descriptor, a counter storage means addapted to receive and store a current counter value input from said selection means, a decrementer means adapted to receive and decrement a current counter value from said counter storage means and to output the decremented counter value to said selection means second input, a test means to test said decrementer means output for a zero value and to output a signal to a control input of said selection means which is representative of the true or false result of the test, whereby, a true result of said test controls said selection means to pass a size descriptor from selection means first input to said counter storage means, and, a false result of said test controls said selection means to pass from said selection means second input said decrementer means output to said counter storage means.

6. An address generator according to claim 5 comprising a plurality of counter means, wherein each counter means controls a subsequent counter to operate only when the controlling counter test means signal is repre-sentative of a true result

7. An address generator according to claim 6 comprising a finite state machine having inputs at least two of which are adapted to receive respective test means signals from respective counter means to generate at leastone finite state machine control signal to select which of the at least two difference descriptors are input to said finite difference calculation means.