CA1264093A - Method and apparatus for addressing a memory by array transformations - Google Patents

Method and apparatus for addressing a memory by array transformations

Info

Publication number
CA1264093A
CA1264093A CA000583309A CA583309A CA1264093A CA 1264093 A CA1264093 A CA 1264093A CA 000583309 A CA000583309 A CA 000583309A CA 583309 A CA583309 A CA 583309A CA 1264093 A CA1264093 A CA 1264093A
Authority
CA
Canada
Prior art keywords
memory
data
array
intelligent
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000583309A
Other languages
French (fr)
Inventor
Sun-Chi Siu
Alan J. Deerfield
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Micron Technology Inc
Original Assignee
Raytheon Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CA000504551A external-priority patent/CA1250370A/en
Application filed by Raytheon Co filed Critical Raytheon Co
Priority to CA000583309A priority Critical patent/CA1264093A/en
Application granted granted Critical
Publication of CA1264093A publication Critical patent/CA1264093A/en
Expired legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

Abstract of the Disclosure A memory having an address generator in an intelligent port which generates address sequences specified by an array transformation operator in a programmable processor, thereby allowing a controlling processor to proceed immediately to the preparation of the next instruction in parallel with memory execution of a present instruction. The intelligent port of the memory creates complex data structures from input data arrays stored in memory and directs the trans-formation of the data structures into output data streams.
The memory comprises a plurality of read-write memory banks and a bank of read-only memory interconnected through in-telligent ports and busses to other units of the processor.
An arbitration and switching network assigns memory banks to the intelligent ports.

Description

~L26~)93 The Government has righSs in this invention pursuant to Contract No. N62269-82-C-0492 awarded by the Department of the Navy.
Background of the Invention This invention relates generally to addressing a memory of a digital system, and in particular to the method and apparatus for addressing a memory by a set of parameters which specify an addressing sequence within the memory for data arrays.
In many digital processing systems, specially programmed units control the ordering of data access from memories ~hile special purpose interfacing unit~ link up components with different data formatting requirements. The proliferation of special purpose units results in inefficiencies, causing high system development costs, long development times, high programming costs, and high system maintenance costs.
The arithmetic unit (AU) in prior processors usually assisted the processor control unit to sequence data items to the arithmetic section or transform data items into a form appropriate for an operation to be performed; this not only increased the complexity of the control unit, but also in-terrupted data processing, causing a reduction in processor efficiency. In addition, often the AU is idle while the next ~ .

' ' ' . .

instruction is being interpreted. It is desirable to continuously control formatting operations over related data items, like arrays, and to let the AU perform continuous AU
functions.
Sometimes special purpose instructions are implemented in digital signal processors to facilitate performing vector-matrix mathematics. Generally, a series of instructions are required to perform a signal processing algorithm using the available speclal purpose instructions and other instructions for correctly indexing and dimensioning arrays. A higher order language that eliminates the need for ancillary parameters to index and dimension arrays is highly desirable, especially when the hardware required to implement such a language is not prohibitive.
-2-.

- :~

~21~4~3 Summary of the Invention In accordance with the present invention, a method is provided for generating addressing sequences for arrays, in-cluding a vector, a matrix or a block, in a memory specified by an array transformation operator comprising a plurality of parameters and under the control of said memory, ~he method comprising the steps of loading an address generator portion of the memory with a plurality of parameters of the array transformation operator including an initial address parameter, generating a plurality of row and column indices specified by displacement and length parameters of the array transformation relative to the initial address parameter and translating each pair of the row and column indices into an address for the memory. The method urther comprises the step of interpreting a boundary parameter of the array transformation for controlling or modifying the generating of the addressing sequences when one of said displacement parameters causes the address to be generated outside a boundary of an array. The boundary parameter speci.fies one of the boundary modes of operation comprisinq a wrap-around mode, a zero-fill mode and an ignore boundaries mode.
In accordance with a further feature of the invention a method is provided for generati~g addressing sequences in a memory for an array specified by parameters of an array transformation comprising the steps of loading aff index .

-:

~i4093 generator means in an address generator of the memory with an initial address specified by an initial address parameter, delta 0, o the array transformation, loading the index generator means with a displacement parameter of the array transformation, loading a counter in the address generator with a length parameter, Ll, of the array transformation and generating a plurality of addresses by incrementing the initial address parameter in the index generator with the displacement parameter a number of times equal to the length parameter. This method of the invention produces a line or vector of Ll addresses. Further methods in accordance with this invention produce a sequence of addresses for arrays of . a two dimen~ional matrix or a three dimensional block.
In accordance with one embodiment of the invention, a lS memory having a plurality of ports is provided with at least three of said ports being intelligent ports comprising an address generator for generating a sequence of addresses for arrays including a vector, a matrix or a block in response to parameters of an array transformation operator. The parameters are loaded into an indices generator by a dis-placement control word and a length control word for generating a plurality of row and column indices relative to an initial reference point of a single or multi-dimensional array. An address translator coupled to the indices generator converts each pair of row and column indices into an address.

., '' ~ ~, -~ ' ~ . ' '-:

. .
, , , ,...... ~ .
.
: ... - :
: .:

~2~ 3 A microprogrammed sequencer controls the operation of the address generator. Each of the intelligent ports may operate in a read or write mode.
In accordance with a further feature of the embodiment of the invention an addrecs generator of an intelligent memory port is provided comprising an index generator means for generating a pIurality of row and column indices, the index generator means being loaded with an initial address specified by an initial address para~eter, delta 0, of an array transformation, means for storing in She index generator means a displacement parameter, delta 1, of the array tran~-formation, means for storing in the address generator a length parameter~ Ll, of the array transformation means for incrementing the initial address with the displacement param-lS eter a plurality of times equal to the length parameter, and means for translating the row and column indices into a plurality of addresses. The index generator means comprises a plurality oE row index generators and column index gener-ators. This embodiment generates a line or vector of Ll addresses, and it may also generate addresses for a plurality of arrays of two ~r more dimensions.

. :. .
.~ . .

:, ' . ~ .

[)93 According to a first broad aspect, the present in-vention provides a memory comprising: means for storin~ data; a plurality of read/write port means, said port means comprising means for transferring data to and from a bus means in accordance with addressing sequences specified by an array transformation;
and switching network means coupled between said storing means and said port means for routing data transfers between said p~ ity o~ storing means and said port means.
According to a second broad aspect, the present inven~ion provides an intelligent memory comprising: means for storing data; port means for transferring data to and from said memory in accordance with addressing sequences specified by an array transformation; said port means comprising an address generator for generating said addressing sequences as specified by said array transformation; network means for coordinating data transfers between said plurality of port means and said storing means; a data formatter for packing and unpacking data to and from said storing means; and a memory controller coupled to said address generator and said network means for controlling said address generator and said data formatter.

- 5a -' ' ..
. .
... - .
, . . . . .. . .
: ~.. .. ~ .. ..
: :. .. . . . ..
.

~2Ç~)93 Brief Description of the Drawings The above-mentioned aspects and other features of the invention are explained more fully in the following descrip-tion taken in connection with the accompanying drawings in which:
FIG. 1 is a block diagram of a Macro Function Signal Processor (~FSP) utilizing an intelligent memory device of the present invention.
FIG. 2 is a block diagram o an intelligent memory and its interfacing busses.
FIG. 3 is a block dlagram of the invention comprising an intelligent memory port.
FIG. 4A and FIG. 4B show the control word formats for specifying an array transformation.
FIGs. 5A-SE show the sequences of factored addressing for transposing an array specified by an array transformation.
FIG. 6 illustrates three boundary modes and functions when an address generator displacement encounters the right edge of a matrix.
FIG. 7 illustrates directions of i, j and k displace-ments on a I; J; K block.
FIG. 8 shows some initial points within a block for wrap-around boundary.
FIG. 9 shows some initial points in the zero-fill boundary mode.

. ., . '' '~,. ~ ' ~ ' ..
: : ~ . .- : .
"~ ' ' : . . .

FIG. 10 is a block diagram of the address generator of the present invention.
FIG. 11 shows the location of the array transformation parameters within the upper and lower matrix access chips.
FIG. 12 is a block diagram of one o the matrix access chips in the indices generator of the address generator.
FIG. 13 is a block diagram of one of the index gener-ators in the matrix access chip.
FIG. 14 is a block diagram of one of the length counters in the matrix access chip.
FIG. 15 illustrates a matrix having row and column boundaries and references zones outside said boundaries.
FIG. 16 shows one illustrative sequence of elements generated by a ~atrix access chip to define an output shape.
lS FIG. 17 illustrates index adjustment from outside array boundaries.
FIG. 18 shows a Digital Fourier Transform coefficient matrix illustrating that the exponent of w is the product of the row and column index.

:; :
, . ~ ', : ' - ' - J
~26~3 Description of the Preferred Er~odiment Referring to FIG. 1, there is shown a block diagram of a Macro Function Signal Processor (MFSP) 10 illustrating an overall system in which an intelligent memory 12 may be used comprising an array transformation address generator implemen-tation of the present invention. More particularly, the intelligent memory 12 may have a plurality of intelligent ports 14-22, although in the MFSP 10 embodiment described herein two of the ports 14 and 22 are simply serving as direct memory access (DMA) ports known to one skilled in the art. Portl 16, port2 18 and port3 20 are the intelligent ports in the system of FIG. 1, primarily because of their ability to execute addressing sequences based on an array transformation operator while co:npletely hiding data attribute considerations Erom an arithmetic unit 38.
In addition to the intelligent memory 12, the MFSP 10 comprises the arith:Petic unit 38, a contro1 processor 32, a node control unit 24, a Syste~ I/O unit 40, an I Bus 26, an S Bus 28 and an A Bus 30, In~ellige-lt ~rt2 18 and intelligent port3 20 of the intnlligent me~ory 12 each have 32-bit direct con,nections to in?uts of the arithmetic unit 38 ,and a 64-bit output of the arit..:netic unit 38 connecting to intelligent portl 16, thereby providing th~ means for stream-ing data to and from the arith~-tic unit 3^~. The A bus 30 not only interconnects the thre~ intellige^.t ports 16-20, but ..
.

~ ~6~93 in addition serves as the MFSP 10 internal control bus inter-connecting a 2-port RAM 34 of control processor 32 and the arithmetic unit 38. The control processor 32 comprises the 2-port RAM 34 and a command interpreter 36 for interpreting instructions and setting up the intelligent memory 12 and the arithmetic unit 38 for execution of said instructions.
The system bus, S Bus 28, interconnects a plurality of units including the node control unit 24, portO ~4, port4 22, 2-port RAM 34, command interpreter 36, arithmetic unit 38 and system I/O unit 40. DMA portO 14 is connected to the node control unit 24 via the I Bus 26. The node control unit 24 provides an interface between networks of high speed busses in a di~tributed multiprocessor and a processor or other device ~uch a~ the MFSP 10. DMA port4 24 has a direct 32-bit connecti~n to the system I/O unit 40 for system application I/O information transfer.
Referring now to FIG. 2, there is shown a block diagram of the intelligent memory 12 comprising five memory banks 52-60, an arbitration and switching network 62 and a plurality of ports 14-22, including the three intelligent ports 16-20 and two DMA ports.14 and 22. The arbitration and switching network 62 directs the flow of data between the ports 14-22 and the memory banks 52-60; it 'provides an 88-bit wide, 5 X 5 crossbar switch with arbitration logic for the five inputs (ports) and five outputs (memory banks~. ~lthough a '':. ' . . .
.
' ~'''"", ' ~1 ~6~ 3 particular embodiment of the intelligent memory is described here for the MFSP lO~ the invention is not limited to a specific number of memory banks nor to a specific number of intelligent ports nor to a specific size crossbar switch.
Still referring to FIG. 2, the 5 X 5 crossbar handles 64 bits for data, 24 bits for address and control from each of the ports 14-22. The arbitration logic resolves conflicts between the ports 14-22. If more than one port requests the same bank of memory, the arbitration logic holds off the lower ~fixed) priority port until the highe~ priority port completes its transfer. Arbitration is performed on a cycle by cycle basis. Four of the memory banks, bankO 52, bankl 54, bank2 56 and bank3 58 are random access memories tRAM), each organized as 64K words by 64 bits. Memory bank4 60 is a read-only memory (ROM) organized as 16K words by 64 bits.
Two of the RAM memory banks 52, 58 are used primarily by port2 18 and port3 20 whlch are primarily used as READ ports.
The other two RAM memory banks are scratch pad areas of memory and as such are used primarily for storing intermediate values via portl 16 which is primarily used as a WRITE port.
ROM memory bank 6D stores constants and approximation tables for use during the operation of various macro functions.
The DMA ports 14 and 22 act as an interface between the I
Bus 26 or the system I/O unit 40 and the intelligent memory 12 for the purpose of transferring large blocks of data; the .
, :. .
:. .
- , ~ . .

.~ ~ .. .
. .
'; - . ~' ~2~ 93 blocks of transferred data reside in consecutive locations in the intelligent memory 12. The DMA ports 14 and 22 also interface with the S Bus 28 for control and status, the I
Bus 26 or the system I/O unit 40 for transfer block data accesses external to the intelligent memory 12 and to the arbitration and switching network for transfer block data accesses to the intelligent memory 12.
The intelligent ports, portl 16, port2 18 and port3 20, have independent controls to address and format each data element. Each port's setup parameters describe the shape of the packed data, where it is to be found (i.e. base address), and the method of access (i.e. read/write, transposed, reversed, etc.). When a port is started, it begins accessing the first element o~ the described data and continues until all data is read or written regardless of errors. The read ports and write ports are identical except for port ID bits for determining the port's function.
Referring now to FI~. 3, there is shown a block diagram of one of the intelligent memory ports 16-20 comprising an address generator 100, a memory controller 102 and a data formatter 104. The address generator 100 produces addresses to access a data array in memory banks 59 so that the data array can appear in various convenient shapes for arithmetic unit 38 manipulation. The data formatter 104 acts as a data 2; translator between the memory banks 59 and the arithmetic , ,- .

.. ~ . ..

-, .

- J
~64~3 unit 38. Data is packed into 64-bit words in the memory bank 59. Packed data is unpacked and left justified in the data formatter 104 prior to its use in arithmetic unit 38.
The memory controller 102 provides control for the address S generator 100 and the data formatter 104, as well as providlng (control line) interfaces to the arithmetic unit 38. In addition, the memory controller 102 is coupled to the arbi-tration and switching network 62 and it initiates and controls all memory accesses of an intelligent memory port.
The intelligent memory 12 requires a small set of parameters to execute any addressing sequence required by a high-level signal processing language such as a Macro Function Lang~lage (MFL) described hereinbelow. These parameters have been integrated into a single addressing operator called an array transformation that directly specifies the hardware control parameters from the signal processing language syntax.
The address generator 100 implements the address functions specified by the array transformation. A pair of 16-bit control words, the displacement control word 80 and length control word 90, as shown in FIGS. 9A and 48 and described hereinbelow, contain array transformation parameters and initialize address registers within the address generator 100 of any one of the intelligent ports 16-20 which then proceeds to execute memory address sequences specified by the array 25transformation.

. , -' :

Prior to describing furthc- the structure and operation of the invention in conjunctior with the drawings, it is necessary at this point to desc-ibe the array transformation operator and certain aspects of the language used to specify the parameters in the array transformation in order to understand the invention. The address generator 100 of the intelligent memory 12 as previolsly noted is functionally specified by the array transfor~ation operator. The word "operator" is used here in the general mathematical sense of an entity that transforms a~ ir.?ut into an output according to the definition of the operat~r. ~he input and output are both arrays, hence the name "ar-ay transformation. n This operator describes array addressing in terms of a factored series of nested addressing sec~ences. An array transforma-tion comprises ten parameters f~r specifying an operation and has the following syntax which ig described below:
~a4 a3 ~2 ~1 1 aO]
[L4 L3 L2 Ll ¦ Bl The language syntax corresponds directly to the parameters required to initialize the address generator lOO component in the intelligent memory 12. .'-5 a result, the mathematical definition of the array transfc~ation operator serves as the hardware definition of the '~d-ess generator 100.
An intelligent memory 12 '-co_es possible when the technique of instruction facto~-ng is used in con~unction .

~26~093 with the separation of data parameters from the processing pro-gram. Instructions are factored into control operators, variable functions, array modifiers, and operands. Each of these with the exception of functions has a significant effect upon memory operation. Control operators act as an addressing control mode to determine the sequence(s) of applying operand-data to the arithmetic unit 38. Control operators s2ecify relationships between array transformations such as lengt~ parameters. Vari-able functions specify the arithmetic and logical operations to be performed on the elements fetched from memory. Array modi-fiers alter the normal addressing mode specified by the control operator in use. Operands refer to the specific data to be used.
When these instruction para~eters are intentionally separated from the parameters of data, the data parameters can be maintained in a data descriptor. A data descriptor would consist of a collection of information describing the variable operands; such as data type, format and location. At run-time, a program refer-ences an operand through the variable's descriptor. Dynamic changes in data "shape" can be handled w-th no changes affecting the program. One important requirement of an intelligent memory is the treatment of an entire variable data array as a single operand. Consequently, the location of the data is determined by a base address or an initial reference~ point which references the initial element of the array of data.
Signal processing algorith~s are conveniently expressed , ''' - ~ ' ~ ' "
,-,, ~2~93 in the language of matrix mathematics. For this reason, MFL
is an array-oriented language. Most variables in MFL programs denote many elements of related data to be treated as single entities. Most operations are defined directly on arrays without requiring item-by-item statements. MFL arrays take-the form of vectors, matrices and bloc~s. Referencing an individual element of an array requires one, two or three numbers called indices to mark its position in the array.
A vector is an array whose elements are selected by a single index. In other words, a vector has one coordinate and i5 considered to be a collection of elements arranged in a line. The number of elements in this line is called the length of the vector. A null vector is a vector containing no elements. The length of a null vector is zero.
A matrix is an array whose elements are selected by two indices. It has two coordinates and is considered to be a collection of elements arranged in a rectangle. The number o~ elements in each row is called the row length. The posi-tion of an element along the row coordinate is called the row position or the column number. The number of elements in each column is called the column length. The position of an element along the column coordinate is called the column position or row number. Togethér, the row length I and column length J constitute the ~shape" of the matrix. Th~
shape is written J;I.

~L2~ 33 In some applications the matrix has a direct corres-pondence to some physical reality such as the data derived from an array of pixels which represent a planar image. In these cases the properties of a matrix are directly applicable to the processing. In other cases the matrix is simply a convenience for purposes of processing. In many signal processing operations, a matrix is merely a collection of vectors. The shape is consistent with the number of row vectors and the length of each vector (which is equivalen~
to the number of columns). The processing requires an itera-tive use of the vector set using and modifying one vector at a time. Consequently the usual interpretation of an array modifier is to apply it to each row vector rather than to the entire array or matrix. Of course it is sometimes neces-sary to modify the complete array as well. In this case the array can be considered as a single long vector.
A block is an array whose elements are selected by three indices. It has three coordinates and i5 considered to be a collection of elements arranged in a set o~ matrices. A
block uses the row and column terminology defined previously for matrices. In addition, the number of matrices in the block K is called the depth of the block. The position of an element along this coordinate is called the depth position or the matrix number. Together, the row length, column length and depth constitute the shape oE the block. The , ~ J ~

shape of the block is written ~;J;I.
To illustrate the concept of factored addressing, the following example qualitatively describes ~ray transposition in terms of factored addressing. When a mztri~ in row-major order is read by columns instead, the outp~t will be a matrix with rows consisting of columns from the in?ut array. The output array is of ran~ two, implying that the procedure must require two displacement sequences.
Referring now to FIG. 5, let I equal '~e length of rows in the input array and let J equal the lencth of columns.
The array transformation is described by t~e following sequences:
(O) Start at the upper left corner o~ the array.
~1) Move down one point in the colu~- direction J - 1 times to define a line of J points.
Return to the polnt beEore the f_-st displacement in the column direc'ion.
~2) Move across one point in the row direction and repeat step ~1) I - 1 times :~ define an I x J matrix.
The sequences illustrated in the prec~in~ example may be generalized into a procedural definitio- of the array transformation:
~O) Go to the initial point delta O 'O).
(1) Move by displacement delta 1 (~1 a =otal oE

_17_ ~ . .,, . ' - -'', '' . . , :

. ~ , . . . .

~264al~33 Ll - l times to define a line o' Ll points.
Return to the point before the first dis-placement of this sequence.
(2) Move by displacement delta 2 (~2) and repeat step one L2 - 1 times to define a matrix of L2 x Ll points. Return to the point before the first displacement of this s~quence.
(3) Move by displacement delta 3 (~3) and repeat steps one and two L3 - l times to define a block of L3 x L2 x Ll points. Rsturn to the point before the first displacecent of this sequence .
(4) Move by displacement delta 4 (Q4) and repeat steps one, two and three L4 - l times and stop.
The result is a set of L4 blocks of L3 x L2 x Ll points.
The rank of the output array is e~ual to t:~e number of dis-placement sequences required to generate it. With the abo~e definition, output arrays of up to ran~ four are possible.
Each nested sequence corresponds to a separate hardware circuit~ When ne,cessary, more sequences ~y be added to the ,definition to produce shapes of higher rarc.
In a Macro Function Language (MFL), a~ array trans-formation on an input array C is specifiec ~y ten parameters written directly below the name of the ar_~y and its data ~.2~4~9~ `

descriptor according to the following syntax:

C16 lO;20 [~4 ~3 ~2~ O]
[L4 L3 L2 Ll ¦ B ]
The parameters fall into three categories: displacements (~), lengths ~L) and boundary modes (B).
Referring now to FIG 6, the boundary mode (B) parameter of an array transformation sets a geometrical context for the displacernents and lengths. The boundary mode determines the action of the address generator 100 if a given displace-ment results in an address that falls outside the boundaries of the array. The boundary modes are as follows:
W ~ wrap-~round. When a boundary is encountered by a displacement, the address generator con-tinues to read on the other side of the array.
For example, when reading from left to right along a row, the address generator 100 moves back to the left-most element of the row i~
the right edge is encountered.
Z - zero-fi~l. All points outside the array ars assumed to be zero for a read port and valid data from the AU 38 i's dropped for a write port.
I = ignore boundaries. This suffix may be appended ' ;. ~ "~ .' . ' . , :
- ': . : . :: , :-, : .. ....
- ~. . .: . .

` - -~6~93 to either zero-fill or wrap-around. In this case, all boundaries except the last point in the array are ignored. So wrap-around with this suffix moves the address pointer back to the head of the array, whereas zero-fill keeps the pointer going.
FIG. 6 shows the effect o each boundary mode on an address generator displacement encountering the ri~ht edge of a matrix.
Displacements of an array transormation are lteratively added to an initial point to generate addresses in a regular sequence. The displacements are defined as pairs of numbers representing generalized spacing on a two-dimensional surface, thereby facilitating detection of the endpoints of each row and column of a matrix. This representation i5 appropriate for most signal processing macros. If desired, the concept may be extended to n-tuples for general displacements on an n-dimensional array. Displacements on arrays may be written as complex numbers where the imaginary part is the displa~e-ment in the row direction and the real part is the displacemen~
in the column dir.ection. Real number displacements are in the row direction with no displacement in the column direction.
Either form may be used with vectors, matrices and blocks.
Symbols may also be used. As shown in FIG. 7, " i" defines a unit displacement to the right across the row direction, ~j n , ' ' .

, defines a unit displacement down the colu~ direction, and "k" defines a unit displacement into the de~th direction.
Displacements by multiples of either i, j or k are indicated by preceding the symbol with the size of t~e displacement.
For example, -5j denotes a displacement by five points in the negative column direction.
A general displacement through a bloc~ requires a triplet of numbers in a particular form ~e.g. dept~; column; row). For most applications, the k direction does no~ require the same level of flexibility given to the j and i ~irections. Data formed into a three~dimensional block usua'ly can be treated as a set of matrices. In these instances, ~wo-dimensional displacements on each matrix along with seclential accessing of each matrix in the block are suffficien'. As a result, displacement triplets through a block are r.ot always directly supported in hardware. However, devices i-~lementing larger ~-tuples might be advantageous for some ap Iications. A
variable displacement value may be stored -~ a co-operand to the array transformation. Presence of an explicit value in the co-operand is indicated by a "d" symbo' in the corres-ponding displacement of the array transfor-~ation. The "d"
is used for user-defined variable, and may ~lso be defined by a translator in order to insert a non-c-ded constant into the appropriate register of the address ge.-rator 100~ The displacements notation of an array transfo~ation along with .

:. .. .... ; ~.
.. ~ .

. . . ~ .

.- ~2~g3 their interpretation in te~ms of ~1 through ~4 are summarized as follows:
Xi = Move across the row X points to the right.
Yj = Move Y points down the column.
Zk = Move Z points into the depth.
0 = Do not displace -- repeat the previous sequence.
d The displacement is contained in a var~able co-operand to the array transformation.
b;a = Move "a" points along the row and "b" points down the column.
c;b;a = Move "a" points along the row, "b~ points down the column, and "c~ points into the depth.
The instruction loading time is reduced by assigning codes to the most frequently required displacements. One possibility is to choose three-bit codes for ~i, +j, +k, or 0. The eighth code is for "dn, to signify that the displace-ment value is a separate complex number sent by the command interpreter 36. All variable displacements and constants not equal to +i, ~j, +kr or 0 are sent to the address gen-erator 100 as ~d"'s. In a two-dimensional address generator, each "d" is replaced with a complex number denoting a general displacement in the combined i and j directions. If either the real or imaginary part is zero, the displacement is solely in the i or j direction -espectively. "k" displace-ments of the form Zk are specified as Z x Jj, on an array ." ':
, i~ J

reshaped to (K x J);I. Hardware directly supporting the full three-dimensional representation would require "d"'s in the form of triplets. With the three-bit codes, four dis-placements and the initial point can be placed into a single 16 bit word as shown in FIG. 4A.
Referring now to FIG. 8 and FIG. 9, the initial point $s a displacement from 0;0;0, the upper left corner of the array. For negative values, its location depends on the boundary mode. FIGS. 8 and 9 compare the different locations of some initial points for wrap-around and zero-fillO
For wrap-around, the displacements to a few of the corners of a bloc~ are as follows:
-i = upper right corner -j = lower left corner -k ~ upper left deep corner -1;-1 = lower right corner O - upper left corner For zero-fill, all of these points are clustered about the upper left corner of the array as shown in FIG. 9. The zero-fill mode interprets the array as having infinite extent with elements outside the R;J;I shape set to zero.
Lengths of an array transformation are real integers indicating the number of times a displacement is performed and the resulting shape of the array. Several mnemonics are 2, provided to specify output lengths in terms of the input ". ~: ,...

. ..:

. ~:''' . .. : ' , ~2~

shape of the array. A capital letter "K" indicates the depth, "J" indicates the column length and ~I" indicates the row length of the input array. Other numeric lengths must be written explicitly in the transformation if they are constant and in a co-operand if they are variable in a form similar to explicit displacements.
For most numerical applications, two ports simultaneously access data to create a pair of input streams for a dyadic function. If a displacement must occur as many times as necessary to match the corresponding displacements of the other input argument to a dyadic function, the number "1" is written. This symbol may be overwritten at run time by the command interpreter 36 with the appropriate length to match the other argument's array transformation or portl 16 and port2 18 can monitor the corresponding port length to deter-mine the replacement length dynamically. If the two array transformations have corresponding lengths equal to one, no displacement occurs for the sequence. This is how a control operator becomes two coupled array transformations.
A displacement may be specified to continue until a boundary of the a~ray is encountered. The length S, mean-ing "stop on boundary, n is used for this case. In wrap-around mode, if length m is set to S, displacements by am continue until a am encounters a boundary. In zero-fill mode! dis-placements by am continue until any lower level dis~lacement ,~ .

, " ' ' ' ' "' ~"

~ J
j4~33 encounters a boundary.
When stop on boundary is used as a length parameter of a port addressing one of the input data streams of a dyadic function, the corresponding length in the other port must be "1", méaning ~repeat until the other array stops on its length. n The following summarizes the array transformation length symbols:
K = depth of input array J = column length of input array I = row length of input array d = the length is contained in a variable 1 = repeat as necessary to match a corresponding output array shape S = stop displacing when a boundary is encountered These six lengths are specified in terms of three-bit codes similar to displacements. The entire length field and bound-ary mode occupy a single 16 bit word as shown in FIG. 4B.
The following examples show a few simple forms of array operations possible with array transformations. Each example is given in terms of a Macro Function Language (MFL) syntax.
The array transformation is applied to an array B or C of some sample shape and type contained in its data descriptor, Below the array transformation is written the data descriptor of the output array from the transformation. The parentheses -2;-:-' ~2~ 3 enclosing this information indicate that it is a response from the intelligent memory 12.
Normal read. The ordering of arrays in row-major order with blocks consisting of sets of matrices implies that normal accessing of data from memory will be:
.
C16 2;8;4 [O k j i¦o]
[1 K J I¦W]
~ C16 2;8;4) The transformation is applied in this example to a block ~
of 16-bit complex data shaped 2;8,4. The notation specifies that the array access begins at the lnitial point 0, the upper left corner of the input array. The primary displace-lS ment is i, the row direction, to form an output row of length I. Next the access moves by j, the coLumn direction, and repeats the first set of displacements to form an outpu~
matrix shaped J;I. The third displacement sequence is by k, the depth, followed by repetition of the column and row dis-placements to produce the final K;J;I output shape. The fourth displacement is a repeat with length 1 to allow for repetition of the block shape to match a corresponding output array of rank four, if necessary. Since t:~is transformation left the array in its original form, it is considered the identity operator.

' , ,~ - . :

,~ ~ ,. . .

~4C~93 Block Transposition~ Some algorithms require that a block of data is to be read by depth, then by column, for each row. In this case, the array transformation is as follows:
B
Cl6 2;8;4 [O i j k¦O]
[1 I J K¦W]
(Cl6 4;8;2) The array access begins at the initial point 0, the upper left corner of the input array. The primary displacement is k, the depth direction, to Eorm an output row of length K.
Next the access moves by j, the column direction, and repeats the first set of displacements to form an output matrix shaped J1K. The third displacement sequence is by i, the row direction, followed by repetition of the column and depth displacements to p~oduce the fina~ I;J;X output shape.
~ . The following array transformation trans-poses matrix B so that rows become columns and columns become rows. The Rl6 means that the array shaped 10;20 contains real 16-bit data.

,~ ' , ':

:
.

.: .
." ~

- J
~ 2~ 93 R16 10;20 [O O i il]
[1 1 I J¦]
(R16 20;10) In this and subsequent matrix examples, it is understood that for blocks, the array transformation would be repeated for each matrix of the block.
Inner ~roduct addressing. The addressing or a matrix multiply matches the row vectors of B against the column vector~ of C. Each row vector o B i~ repeated to match each of C[I] column vectors in C before the next row of B is read. The array transformations are as follows:
B C
R16 10;20 R16 20;15 [o j o ilOl [o o i il]
11 J 1 I¦W] [1 1 I J¦W]
(R16 10;15;20) (R16 10;15;20) The fir~t sequence in the 8 array transformation is by i or length I, meaning that a row vector from B is read. The corresponding sequence for C reads a column, implying that B[I], the row length of B, and C[J], the column length of C, must be identical. The second sequence for C is i for length I, meaning that the columns of C are read consecutively.
The row vector of B is repeated to match the number of columns , : .'~ . ~ -- -., J
~26~.~093 in C, as indicated by the O displacement and the 1 length.
This matching is repeated for every row vector in B, as indicated by the j for length J in the third sequence of B
and the corresponding repeat for C.
Several conventions may be used to abbreviate array transformations as ~ollows:
1. When no displacement is written, assume that it is 0(1), meaning "repeat as necessary to match the other input array.~
2. When no length is written with an i or -i, assume that the length is I, the row length of the variable.
3. When no length is written with a j or -;, assume that the length is j, the column length oE the variable.
4. When no length is written with a k or -k, assume that the length is K, the depth of the variable.
5. When no initial point is given, assume that it is 0. When the initial point is defaulted, the vertical line used to separate the initial point from other displacements is deleted.
6. When no boundary mode is given, assume that B equals wrap-around.
7. If no array transformation is given, assume .

.' - :' ".;. :r . '`'' ~
' .

that it is a normal read:
[0 k j i¦o]
[1 K J I¦W]
8. An array transformation may be written with the entire displacement line omitted. In this case, the remaining line specifies the lengths in the form to be used with a normal read. To indicate that the displacement line has been omitted, the length line is written with the type and packing in the form of a new data descriptor. Semi-colons are used to separate the lengths.
B B
R16 3;8 is equivalent to R16 3;8 R16 3;4 10 k j i¦o]
[1 R 3 4¦W]
Returning now to the description of the structure and operation of the invention, and referring now to FIG. 10, there is shown a block diagram of the address generator 100, which provides an address 130 for its intelligent port each machine cycle. The address generator 100 comprises a micro sequencer 122 along with its control store 120, an indices generator 111 comprising two matrix access chips (MAC) 110 and 112, an address translator 114, a multiplier 118 and a P
Bus 128 for information transfer within the address generator 100. ~he matrix access chips (MAC) 110 and 112 provide row .

.

:~ : . ' ~. '' :

'. ': , , `' ' 0~3 and column indices speciEied by the parameters of an array transformation which are loaded into MAC 110, 112 via the 16-bit displacement control word S0 shown in FIG. 4A and the l~-bit length control word 90 shown in FIG. 4B for addressing each and every data element of an array.
This pair of 16-bit control words containing array transformation parameters shown in FIG. 4A and FIG. 4B
initialize address registers within the address generator 100 of any one of the intelligent ports 16-20 which then proceeds to execute memory address sequences specified by the array transormation. The displacement control word 80 as shown in FIG. 4A has six fields. Five of these fields are identical and contain 3-bit codes for specifying each of the displacement parameters, delta 4 through delta 0, o an array transformation, The other field is a l-bit fractional Eield (F) which when set to a 1 indicates that the least significant 5-bits of a 16-bit data bus are u~ed to support fractional displacement. Table 1 lists the eight functions available for most of the delta fields (except Eor ~he P
function).
The length c,ontrol word 90 as shown in FIG. 4B has six fields. Four of these fields, length 4 through length 1, are identical and contain 3-bit codes for specifying,each of the length parameters of an array transformation. The "mode field" is also 3-bits and the Boundary (B) field is .:
-: :
,. .: : :
, : . .: , ' . .

-l-bit. Each of the length-fields has eight possible func-tions and Table 2 lists these functions and their definitions.
The mode field defines the modes of operation within the address generator 100 of the matrix access chips 110, 112, and Table 3 lists the eight modes. The boundary field bit, B, determines the response of the intelligent ports 16, 18, 20 when a boundary condition is encountered. When set to "0~ the port performs a zero-fill mode. The boundary modes are described herein and illustrated in FIG. 6.
Each MAC device 110, 112 may be implemented with a 180 pin, 7500 gate, CMOS gate arra~ technology. The address translator 114 is coupled to the MAC 110 and 112 and the multiplier 118. The multiplier 118 may be embodied by model IDT7217L multiplier manufactured by Integrated Device Technology, Inc. of 3236 Scott Boulevard, Santa Clara, CA
95051. The address translator 114 converts the row and column indices to the 30 bit address 130 which points to the location of the most significant bit of a data element in the intelligent memory 12. The multiplexer 116 can be controlled so that row and column indices are multiplied together to produce FFT coefficients. The address trans-lator may be implemented with a 144 pin, 5700 gate, CMOS
gate array technology. The microsequencer 122 and its associated control store 120 directly control index gen-eration in the MAC 110 and 112 and provide control signals , .~ :

.

: ~.

J
~ 3 for data flow within the entirC intelligen_ memory 12 pipeline.
The microsequencer may be imple-ented with an AS890 sequencer manufactured by Texas Instrumer.~s Incorpor2ted of Dallas, Texas 75265. The MAC 110, 112, devices together with the microsequencer 122, constitute the bulk of the n intelligence~
of the Ports 16, 18 and 20.
The partitioning of the M~ 110, 112 logic provides a slice architecture that allows each MAC lla, 112 to independ-ently address two dimensional vzriables spocified by an array transformation. The intelligent memory architecture supports two independent address streams one at a time, with ~witching between the streams ~nder microcc~e control 124 as directed by the 3 mode bits in ~he length control word 90.
Hence, the 3 mode bits provide night modes of operation summarized in Table 3 or the ~C 110, 112 and are as folIows:
MAC DUPLEX Mode refers to the c?eration of each MAC independ-ently, each producing its own a-~ress sequence, while MAC
SIMPLEX Mode refers to the operation of thn MAC 110, 112 devices together to produce a s_ngle address sequence.
Simplex mode comprises a full s-mplex mode and a half simplex mode. Full simplex mode uses ~th MACs 110, 112 but half simplex mode uses only the lower MAC 112. ~alf simplex mode is only an optimization for spc~d that com^s from microcode partitioning. Duplex mode con=~ins submod-s called flushed duplex mode and non-flushed du~:ex mode ea-h having three 4~33 variations for a scalar, vector, and matrix. Flushed mode data are routed back into Address Generator 100 through the P
BUS 128, whereas nonflushed mode data go to the AU 38 or the address directs where AU 38 data goes in RAM memory 50.
Scalar, vector, and matrix modes determine when the rest of intelligent memory services the indices generated by the other MAC.
The architecture of the MAC 110, 112 devices allows direct support of array transformation statements and MAC
110, 112 devices are cascaded within the indices generato~
111 to handle up to four dimension variables. The idea of displacement and length parameters as described hereinbefore is the ba~is of the array transformation statements. FIG. 11 shows the location of the array transformation parameters within the cascaded upper and lower matriY access chips 110, 112 of the indices generator 111. Array transformations supported by the pair of MAC 110, 112 devices allow for a generic method of generating an address sequence for a rank four output shape array from a rank two input shape array.
Referring now to FIG. 11, FIG. 12 and FIG. 13, there is shown the functio.nal structure of a MAC 110, 112 device.
,Within each MAC 110, 112 there are four index generators 140-143 comprising an upper row and column index pair 140, 141 and a lower row and column index pair 142, 143. Each index generator contains a 16-bit base working register 162 .. ' ~ ~ ,.. .
...

' ' .: ': ' ,' ~2~i~093 and a 16-bit displacement -_gister 160 as shown in FIGo 12~
Each of these two registers can be written to and read from via the A BUS 30 prior to -~struction execution.
The data in the displacement register 160 of the lower MAC 112 once loaded from the CI 36, will not change. All data held in the MAC 110, 112 will be 16-bit two's complement and loaded as integers. W~en the fractional addressing mode is selected, the shifters 148, 149 will shift the data 5 bits to the right (sign extended) before the index is pro-duced. When the fractional addressing mode is not selected then all loaded data is in ~nteger format and the shifters are transparent.
The comparators 146, 1:7 ~ake running comparisons between the current Lndex and the c~ntents of the two boundary regis-ters, column length registe 1~4 and row length register 145. The running comparisc~ allows the circuitry to deter-mine when the current inde~ has exceeded the boundary limits of the "Input Shape" via t'~ status bits in the condition code register and MUX 158. The Input Shape is defined by the contents of the column length register 144 and row length register 145. The upper length counter 150 and lower length counter lSl are used to co~t the number of displacement increments imposed on the conteffts of the ~0 register con-tained in the same level. s each displacement is added to the current content of ~0 -~e associated length counter : ~ :' ' ,, , . .
.
. ~ .

~; - ' ' :
,: :~ - .
. .. ~ .. . .. . : ~-~2~i4093 within that level is decremented. Negative sign detection circuits generate status 152, 153 bits which flag to the microsequencer 122 the boundary of the output shape.
Still referring to FIG. 13 showing a block diagram of one of the index generators 140-143, both the displacement register 160 and the working register 162 have four clear and preset inputs 171-174. Two of these are dedicated to the selective clears or presets of the upper 15-bits of the register. The other two preset and clear inputs are dedi-cated to the least signi~icant bit (LS8) of the register.
Hence, these registers 160 and 162 are asynchronously preset and cleared to one of four default values 0, 1, -1, -2. Each one of the index generators 140-143 has a 16-bit complement capability performed by exclusive-or gates 168 and the com-plement 175 signal. Also, there is a 16-bit fast carry adder 170 complete with carry input 176. This architecture allows the uncomplemented or complemented contents of the displacement register 160 or column length registers 144 and row length register 145 to be added to the current content of the working register 162.
The upper and lower level registers in the index genera-tors 140-143 in each MAC 110, 112 are distinguished by the prefix U and L respectively. Thus, the displacement register 160 and the working register 162 in the index generator of the MAC 110, 112 devices have the following designations:

, .
,, : - . .
-.. ., ., ~ ," , ,:
~ ~:

: : , :

, J

uri;uci = Row and Column upper level working registers (U 0) lri;lci a Row and Column lower level w~rking registers (L 0) udri;udci = Row and Column upper level displacement registers (U 2/4) ldri;ldci = RCW and Column lower level displacement registers (L 1/3~
Hence, each MAC 110, 112 has an upper and lower displacement register 160 in each of the upper and lower index generators 140-143 for storing the ~4 and ~3 displacement parameters of an array transformation. Where there is a need to dis-tinguish between upper and lower MACs, then the preflx U or L
will be used.
Referring now to FIG. 14, there is shown a block diagram of the length counter 150, 151. The length number register 180 contains static data (i.e., not updated during execution) and acts as a reference ~or re-initializing the length count number register 182. The length count number register 182 contains a current count of the number of displacements that an inde~ generator 140-143 will invoke in otder to generate a one dimension access sequence. Each level of a MAC 110, 112 handles one dimension. Thus, the MAC llO, 112 length registers are referred to as follows:
UN = upper level number register UCN = upper level count number register LN = lower level number register LCN = lower level count number register The length counter 150, 151 is provided with a negative sign .. ;

J
~2~ 33 detection capability which allo~s the microcode to determine via the XCNEG 194 status bit whGther or not to continue wit:~
the displacement increments wit~.in the index generator belong-ing to the same level.
S The length counter 150, 15! decrements the current content of the length count number register 182 by one when a valid index offset pair has been generated. If the current count is negative then it stays negat-ve even after several decrements executed due to condition code ~ipeline delay. This ~ives the circuitry more time to catch t~.e end condition. Both the length number and count number r~gisters 180 and 182 contained in a length counter 150, 151 al'~w read and write access , over the bus in a similar manne as the index generator 1~0-143. However, these registers 180 and 182 are not independently writable because ~is function is not required.
Referring naw to FIG. 15, 'here is shown a matrix with a plurality of zones identified outside of its boundaries.
.The row length 200 and column longth 202 comprise the "~NP~T
SHAPE". The top left hand corr.~_ is regarded as the zero reference point. An initial d.s?lacement 0 is an offset from this reference point. Th~ in?ut shape or matrix consists of the number of rows and colur-s of data elements and is thus limited to two dimensions. Ezch data element i5 uniquely defined in terms of its row in--x a~d column index. The reference point is therefore d~-insd as (0,0). An element .

:: .
, ` - -~ 2~4~93 ln zones 1, 4 or 7 is said to have exceeded the lower row boundary of the input shape. The column index of any such elements will be negative. The ~AC 110, 112 are therefore required to handle negative numbers via the two's-complement notation. The MAC 110, 112 supplies to the microsequencer 122 a status bit 157 sourced from the sign bit of the current column index. An element in zones 1, 2 or 3 is said to have exceeded the lower column boundary of the input shape or matrix. The row index of any such elements will be negative.
The MAC 110, 112 supplies to the microsequencer 122 a status bit sourced from the sign bit of the current row index.
An element in zones 3, 6 or 9 is said to have exceeded the upper row boundary of the input shape or matrix. Any such elements will have positive column indices but the integer value of the column length 202 will be greater than or equal to the row length 200.
A running comparison is performed between the current column index of each level and the column length of the input shape. The MAC 110, 112 supplies to the microsequencer 122 a status bit 154 which indicates the result of this running comparison.
An element in zones 7, 8 or 9 is said to have exceeded the upper column boundary of the input shape or matrix. Any such elements will have positive row indices of integer value greater than or equal to the column length. A running ~. . ` ' '' ' :

. .
. ~ ~ . . .

~2~i~0~3 comparison is performed between the current row index of each level and the column length. The MAC 110, 112 sup21ies to the microsequencer a status bit 155 which indicates the result of this running comparison.
Referring now to FIG. 16, in addition to FIGs. 11-14, a MAC 112 is required to generate a sequence of elements in response to an array transformation where each element is speci~ied by a row index and a column index pair which will define an output shape. Movement from one element to the next element in a sequence i8 accomplished by specifying the contents of a row and column displacement register 160 and adding it to the contents of the current working register 162 and decrementing the length counter 182 in a MAC 112. The lower level of the MAC 112 is used for this one dimensional movement. Ths displacement register 160 in the lower level and the working register 162 in lower level are loaded with the ~1 displacement index paraneter of the array trans-formation. In the example shown in FIG. 16 the working register 162 initially is loade~ indirectly by the command interpreter 36 prior to going into execution mode with the index of point "1" as defined by a ~0 in an array trans-formation. When a movement in .he one dimension specified by the contents of the lower level displacement register 160 comes to an end tindex 6 ir. FIG. 16) and when the length counter is negative then the ~o-lowing occurs: the displacement .

'~

- :, ' , `-`' ' ` :

J
~L26~33 register 160 in the upper level of the same MAC 112 will contain the Q2 displacement index of the array transformation required to move the current index of that level to the next linear sequence (i.e. from 1 to 7 in FIG. 16). The result, after being loaded back into the lower level working register 162 and validated, will be down-loaded into the lower level working register 162 of the MAC 112 and the length counter number (LCN) register 182 will be re-loaded from the content3 of the length number (LN) register 180. The lower level of MAC 112 is then ready to once more complete a linear sequence from 7 to 12 in FIG. 16 as specified by the ~1 displacement index parameter. This process corresponds to a two-dimensional addres~ sequence since ~1 and ~2 will not be altered; also, this description can be extended to higher order address sequences up to four dimensions in the present embodiment~
Each time a new dimension is invoked, transfer of data from an upper level to a lower level of MAC 110, 112 hardware is required. It is therefore a requirement of the MAC 110, 112 that each ~0 working register 162 have access from the level above which includes upper MAC 110 to lower MAC 112 transfers.
Referring now to FIG. 16 and FIG. 17, the indexes 5, 6 and 12 as shown in FIG. 16 are 'said to be outside the input shape or "out of bounds." The occurrence of these indices will be handled differently depending on the boundary mode ' , ~4~93 selected. Each level of MAC 110, 112 contains a length number register 180 which stores the length control word as shown in FIG. 4B. The length number register 180 is loadable from the CI 36 during initialization and one bit of this control word is allocated to boundary mode B. The two modes selectable via this bit are "wrap-around" and "zero-filln.
Each invokes different system responses when the current index is out of bound ~indicated by the boundary status bits). In zero-fill mode an out of bounds index is "valid"
but the data obtained from memory will be substituted by zero. In a write port the data would simply not be written to memory. In wrap-around mode indices outside the boundary are regarded as invalid. It is a requirement in thi~ mode that if a single displacement results in an index outside the input shape ~matrix) it must be adjusted to point to an index inside the input shape. For index adjustment of the example shown in FIG. 17 to go from 24 to 19, it is required that the row length be subtracted from the column index. To go from 4 to 40 requires column length to be subtracted from the row index. In order to support this adjustment require-ment, the MAC 110, 112 architecture allows the selective $ubtraction of the content of the boundary length registers 144, 145 in each level of the MAC 110, 112.
Referring now to FIG. 10, the MAC 110, 112 devices pro-duce row and column indices representing offsets from an : -:' ~. :' . .

...
, .
:: , .: . : , . . .
, .

initial reference or starting point (~ 0) as specified in an array transformation. The Address Translator (AT) 114 toge-ther with the multiplier 118 converts this offset into a 30 bit address 130. In order to calculate this address, the AT 114 must know the base address, ~0 (initial starting point), the row length (Ll - L4 number of elements in a row), and the packing factor of the data element (number of bits that comprise the data element). These values are loaded into the AT 114 prior to current instruction execution by the command interpreter 36 via the A BUS 30. The ~T 114 registers are double-banked, allowing the command interpreter 36 to set up for the next instruction while the AT 114 is executing the current instruction.
The address translator 114 essentially converts a row index and column index identifying a location of data to a linear address to identify the location of the element of data. The AT 114 supplies the row length 139 to the multi-plier 118, which then multiplies the row index 137 offset by the row length 139 and supplies the result back to ~he AT
114. The AT 114 adds this product with the column index 138 offset to obtain the total index offset from the initial starting point. The AT 114 converts this index offset to a physical offset by shifting the index offset by an amount equal to the packing factor effectively multiplylnc by the number of bits in the data element. This address cffset is ". ,;. ~. ~ :

. - -~2~4~

then added to the base address to obtain the 30 bit address 130 for the data element. This 30-bit address 130 now points to the most significant bit of the data element to be accessed.
An alternate path exists for generating Fast Fourier Transform (FFT) coefficient addresses. This path replaces row length by a column index (multiplier 167) and the sum of the product and column index by the product itself (which is the product of the row and column index). The product of the row and column index represents a linear offset into the linear table of complex exponentials stored in the ROM memory bank 60. The Digital Fourier Transform (DFT) coefficient matrix is shown in FIG. 18; it illustrates that the exponent of w is the product of the row and column index.
Referring now to FIG. 3, the data formatter (DF) 104 may be implemented with two identical 180 pin, 2500 gate, CMOS gate arrays, each having a 32-bit slice and may be configured either as Read or Write data formatters. This configuration is controlled by hardwired connections.
As a Read formatter, the data formatter 104 reads 64 bit packed data from the Memory banks 59, unpacks the data element, shifts and masks all unnecessary bits to zero, and presents to the AU 38 a left-justified data element. Shift amounts and mask parameters (i.e. packing factor etc.) were pre~iously loaded into the data formatter 104 by the command interpreter 36. As in the address translator 114, data .

'.' ., ,:

': '- .. , :
''.:. :' , ~
.::: .,, - -- ~ ' ~4~93 formatter 104 control registers are also double banked. All normalization calculations are performed by the command in-terpreter 36, which in turn informs the DF 104 of the results via the shift amount.
As a Write ormatter, the data formatter 104 is presented left-justified data elements from the arithmetic unit 38.
The data formatter 104 must perform a read-modify write operation, packing this new data element among the unchanged data elements of the 6~ bit word in intelligent memory 12.
The control circuitry of intelligent memory 12 is such that a read for the Read modify write occurs only if the 64 bit boundary has been crossed. This eliminates unnecessary memory reads by the write port.
Still referring to FIG. 3 there is shown the Memory Controller (MC) 102 which may be implemented with a 144 pln, 3500 gate, CMOS gate array and provides overall central control for the intelligent memory 12 pipeline. In addition, it provides interfaces to the command interpreter 36, arithmetic unit 38 and the arbitration and switching network 62.
The A BUS 30.interface of the memory controller 102 decodes the A BUS address, providing chip selects as required, and acts upon or distributes all A BUS control signals as required. The memory controller 102 also controls bidirec-tional buffers for the A BUS 30 data path. This method ~2,~ 9~

allows each intelligent memory 16-20 port to present only one load to the A BUS 30, and allows all decode circuitry to reside in one central location for each of said ports 16-20.
The AU interface supplies data ready and data request control lines to the arithmetic unit 38. These lines are used to control data flow between the AU 38 and an intelligent port 16, 18, 20. ~ased on the state of these control lines, the memory controller 102 has the ability to selectively start and stop the intelligent port pipeline as required.
The memory controller 102 also provides the intelligent ports interface to the arbitration and switching network 62.
The memory controller 102 receives the 30 bit address 130 from the address generator 100, and decides if a memory acce~s is necessary ~if the 64 bit word boundary has been crossed). If a memory access is required, the memory con-troller 1~2 generates a 3 bit BANK REQUEST 93 code to the arbitration and switching network 62. The memory controller 102 then looks for the bank acknowledge 9~ signal from the arbitration logic, stopping the intelligent port pipeline and notifying the arithmetic unit 38 if the port has lost memory arbitration. Thus the memory controller 102 controls the overall flow of data through the intelligent port pipeline, and between the intelligent port 16-20, the arithmetic unit 38 and the memory banks 52-50.
This concludes the description of the preferred embodiment.

.~.

J
4~33 However, many modifications and alterations would be obvious to one of ordinary skill in the art without departing from the spirit and the scope of the inventive concept. For example, the number of RAM or ROM storage locations in the intelligent memory 12 may vary and the number of intelligent ports may vary depending on system applications. Also, the multiplier 118 in the address generator 100 could be removed if memory chips with two dimensional structures were available externally for row and column indices to index into directly instead of the current approach of producing a linear address displacement first and then brea'~ing it down inside memory chips into row and column addresses. In this case, FFT
coeficients could be generated by using variable "deltas"
as provided for in duplex mode. In addition, the parameters of the array transformation and ~upporting hardware embodiment can be expanded to specify additional displacement sequences for arrays of higher rank. Therefore, it is intended that the scope of this invention be limited only by the appended claims.

' ~1 2~4~9~

.

Function Description Z Initialize the specified row and column registers to zero so that they point to upper left corner of the matrix. This results in zero displcement.
+i Initialize the specified delta row register with zero, and the delta column register with one. This results in the movement across the row one point to the right.
~j Initialize the specified delta row register with one, and the delta column register with zero. This results in the movement down the column one point.
~k Initialize the specified delta row register with one and the delt~ column register with one. This results in a one point diagonal movement downward to the right.
P Initialize the specified delta row register with zero and the delta column register with zero. This function also specifies that the row and column delta registers need data substitution prior to every round of execution. (Not applicable to Delta 1 and Delta 2 fields.) -i Initialize the delta row register with zero and the delta column register with negative one. Movement is one point to the left.
-j Initialize the delta row register with negative one and the column register with zero. Movement is one point up the column.
-k Initialize the delta row and column register with negative one. Movement is one point - diagonally upward to the left.

~:

-, . `, ,~

FUNCTION DEFINITION
-1 Repeat as necessary to match a corres-ponding output array shape I Row length of input array J Column length of input array S Stop on boundary NOP Default option NU Not used NU Not used .. . . _ NAME DEFINITION
..
FSPX Full Simplex Mode DPXS Duplex Mode Scalar DPXV Duplex Mode Vector DPXM Duplex Mode Matrix HSPX Half Simplex Mode FDPXS FIushed Duplex Mode Scalar FDPXV Flushed Duplex Mode Vector FDPXM Flusbed Duplex Mode Matrix -:

.. . .

: .~, . ... ..

Claims (9)

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A memory comprising:
means for storing data;
a plurality of read/write port means, said port means comprising means for transferring data to and from a bus means in accordance with addressing sequences specified by an array trans-formation; and switching network means coupled between said storing means and said port means for routing data transfers between said storing means and said port means.
2. The memory as recited in claim 1 wherein, said port means comprises at least two intelligent ports for transferring said data to and from an arithmetic means.
3. The memory as recited in claim 2 wherein:
at least one of said intelligent ports operater in a read mode and at least one of said intelligent ports operates in a write mode.
4. The memory as recited in claim 1 wherein:
said storing means comprises at least one random access memory and at least one read only memory.
5. The memory as recited in claim 1 wherein.

said array transformation comprises a plurality of parameters for the generation of said addressing sequences for 50a transferring a data array to or from said memory, said array comprises a vector, a matrix or a block of data.
6. An intelligent memory comprising:
means for storing data;
port means for transferring data to and from said memory in accordance with addressing sequences specified by an array transformation;
said port means comprising an address generator for generating said addressing sequences as specified by said array transformation;
network means for coordinating data transfers between said plurality of port means and said storing means;
a data formatter for packing and unpacking data to and from said storing means; and a memory controller coupled to said address generator and said network means for controlling said address generator and said data formatter.
7. The intelligent memory as recited in claim 6 wherein:
said array transformation comprises a plurality of parameters for generating said addressing sequences for storing or accessing a data array; and said array comprises a vector, a matrix or a block of data.
8. The intelligent memory as recited in claim 6 wherein:
said port means comprises at least three intelligent ports for transferring data to and from said memory, at least two of said ports operating in a read mode and at least one of said ports operating in a write mode; and said port means further comprises a direct memory access port for input-output data transfers.
9. The intelligent memory as recited in claim 6 wherein:
said storing means comprises at least one random access memory and at least one read-only memory.
CA000583309A 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations Expired CA1264093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA000583309A CA1264093A (en) 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US72033085A 1985-04-05 1985-04-05
US720,330 1985-04-05
CA000504551A CA1250370A (en) 1985-04-05 1986-03-19 Method and apparatus for addressing a memory by array transformations
CA000583309A CA1264093A (en) 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA000504551A Division CA1250370A (en) 1985-04-05 1986-03-19 Method and apparatus for addressing a memory by array transformations

Publications (1)

Publication Number Publication Date
CA1264093A true CA1264093A (en) 1989-12-27

Family

ID=25670943

Family Applications (2)

Application Number Title Priority Date Filing Date
CA000583308A Expired CA1262968A (en) 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations
CA000583309A Expired CA1264093A (en) 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CA000583308A Expired CA1262968A (en) 1985-04-05 1988-11-16 Method and apparatus for addressing a memory by array transformations

Country Status (1)

Country Link
CA (2) CA1262968A (en)

Also Published As

Publication number Publication date
CA1262968A (en) 1989-11-14

Similar Documents

Publication Publication Date Title
US4959776A (en) Method and apparatus for addressing a memory by array transformations
US4819152A (en) Method and apparatus for addressing a memory by array transformations
Thakur et al. An Extended Two‐Phase Method for Accessing Sections of Out‐of‐Core Arrays
de Rijk A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer
US4633389A (en) Vector processor system comprised of plural vector processors
US5247632A (en) Virtual memory management arrangement for addressing multi-dimensional arrays in a digital data processing system
EP0789311B1 (en) System and method for emulating memory
CN100410919C (en) Processor
US5175701A (en) System for performing linear interpolation
US3936806A (en) Solid state associative processor organization
WO1999066393A1 (en) Registers and method for accessing data therein for use in a single instruction multiple data system
JPH07152733A (en) Computer system and method for processing vector data
US20200065252A1 (en) Streaming engine with separately selectable element and group duplication
US5583803A (en) Two-dimensional orthogonal transform processor
US20110213937A1 (en) Methods and Apparatus for Address Translation Functions
EP0124799A2 (en) Memory access arrangement in a data processing system
EP3931688B1 (en) Data processing
US20200050573A1 (en) Superimposing butterfly network controls for pattern combinations
EP0201174B1 (en) Method and apparatus for addressing a memory
EP0253956B1 (en) An addressing technique for providing simultaneous read modify and write operations with serpentine configured rams
GB2515755A (en) Method and apparatus for performing a FFT computation
CA1264093A (en) Method and apparatus for addressing a memory by array transformations
US5008852A (en) Parallel accessible memory device
US5414821A (en) Method of and apparatus for rapidly loading addressing environment by checking and loading multiple registers using a specialized instruction
US5056014A (en) Network simulation system

Legal Events

Date Code Title Description
MKLA Lapsed