GB2237908A - Parallel processing of data - Google Patents

Parallel processing of data Download PDF

Info

Publication number
GB2237908A
GB2237908A GB9020776A GB9020776A GB2237908A GB 2237908 A GB2237908 A GB 2237908A GB 9020776 A GB9020776 A GB 9020776A GB 9020776 A GB9020776 A GB 9020776A GB 2237908 A GB2237908 A GB 2237908A
Authority
GB
United Kingdom
Prior art keywords
data
column
gt
lt
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9020776A
Other versions
GB9020776D0 (en
GB2237908B (en
Inventor
Steven Maxwell Parkes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BAE Systems PLC
Original Assignee
BAE Systems PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to GB898925227A priority Critical patent/GB8925227D0/en
Application filed by BAE Systems PLC filed Critical BAE Systems PLC
Priority to GB9020776A priority patent/GB2237908B/en
Publication of GB9020776D0 publication Critical patent/GB9020776D0/en
Publication of GB2237908A publication Critical patent/GB2237908A/en
Application granted granted Critical
Publication of GB2237908B publication Critical patent/GB2237908B/en
Anticipated expiration legal-status Critical
Application status is Expired - Fee Related legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8023Two dimensional arrays, e.g. mesh, torus

Abstract

In parallel processing of data, the data is organised into a two dimensional array having at least two rows (5a, 5b, 5c, 5d) and at least two transverse linking columns (6a, 6b, 6c, 6d), first high level data processing is carried out by first processing means on the rows or on the columns, corner turning is carried out on the first processed data to turn it from said rows into said columns or vice versa, and second high level data processing is carried out by second processing means on the corner turned data in said columns or in said rows, with the first processed data in said rows or columns being stored, before or after corner turning, in separate memories (3a, 3b, 3c, 3d) associated one with each row (5a, 5b, 5c, 5d) or column (6a, 6b, 6c, 6d). <IMAGE>

Description

Method and Apparatus for Parallel Processing Data This invention relates to a Method and Apparatus for parallel processing data, particularly, but not exclusively, suitable for the processing of signal and! our image data.

Data is commonly stored serially row by row on a direct access bulk storage peripheral such as a disc file unit. Such data may be transferred to or from the disc file in blocks which are stored at random on the disc. Thus if it is required to access the columns of a matrix stored row by row, many blocks will require retrieval from the disc to access the column elements. This is time consuming and inefficient.

One way of reorganising the stored data is to transpose the data so that the stored blocks contain data in serial column order instead of serial row order. This reorganisation is termed 'corner turning'. Conventionally such corner turning has been implemented by writing the row ordered data into a single large memory and then reading it out in column order using a "column ordered" address generator. However this known technique has the disadvantage of causing a communications bottleneck.

There is thus a need for a generally improved method and apparatus for parallel processing of data which is more efficient and which causes less of a communications bottleneck than the aforementioned conventional techniques.

According to one aspect of the present invention there is provided a method of parallel processing data, in which the data is organised into a two dimensional array having at least two rows and at least two transverse linking columns, first high level data processing is carried out on the rows or on the columns, corner turning is carried out on the first processed data to turn it from said rows into said columns or vice versa, and second high level data processing is carried out on the corner turned data in said columns or in said rows, with the first processed data in said rows or columns being stored, before or after corner turning, in separate memories associated one with each row or column.

Thus the corner turning memory is distributed between two or more column processing elements. By operating all the memories in parallel the communications bottleneck caused by a single large corner turning memory is overcome.

Preferably said first high level data processing is carried out on each of said rows of data, the corner turning is carried out on the processed row data to turn it into column ordered data and said second high level data processing is carried out on the column ordered data.

Conveniently said first high level processing is carried out by one row processor per row, said second high level processing is carried out by one column processor per column and the processed row data is stored, in said separate memories associated one with each row, before corner turning.

Advantageously said first high level processing is carried out by one row processor per row, said second high level processing is carried out by one column processor per column and the processed row data is stored in separate memories associated one with each column after corner turning.

Preferably corner turning is carried out by feeding the processed data from each row in sequence, in parallel into a shift register associated one with each column to form a series of data sets and shifting the series of data sets from each shift register into the associated memory in column order, from whence the column ordered data can be read by the associated column processor.

Conveniently said first high level processing is carried out on each of said columns of data, the corner turning is carried out on the processed column data to turn it into row ordered data and said second high level processing is carried out on the row ordered data.

Advantageously said first high level processing is carried out by one column processor per column, said second high level processing is carried out by one row processor per row and the processed column data is stored after corner turning in said separate memories associated one with each row.

Preferably the corner turning is carried out by feeding the processed data from each column in sequence, in parallel into a shift register associated one with each row to form a series of data sets and shifting the series of data sets from each shift register into the associated memory in row order, from whence the row ordered data can be read by the associated row processor.

Conveniently one dimensional Fast Fourier Transforms are carried out on the data in each processor.

According to a second aspect of the present invention there is provided apparatus for the parallel processing of data, including means for organising data into a two dimensional array having at least two rows and at least two transverse linking columns, first processing means for carrying out first high level data processing on the rows or the columns, corner turning means for carrying out corner turning on the first processed data to turn it from said rows into said columns or vice versa, second processing means for carrying out second high level data processing on the corner turned data in said columns or in said rows, and at least two separate memories associated one with each row or column, which memories are located and operable to store the first processed data in said rows or columns before or after corner turning.

Preferably the first and second processing means are data processors located one in each row and column and wherein the corner turning means includes a plurality of shift registers located one in each column.

Conveniently the array has at least two substantially parallel rows, with the first processing means data processors being located respectively one at each input end of each row, with the output end of each row being connected to the shift register of one column and with the rows being connected intermediate the row ends to the shift register of another column, and wherein the second processing means data processors are located respectively one at each output end of each column to receive the output from the associated shift register.

Advantageously the memories are located one in each row between the associated row data processor and the row connections to the column shift register most remote from the output ends of the rows.

Preferably the memories are located one in each column between the associated column data processor input and the associated column shift register output.

Conveniently each data processor is operable to carry out one dimensional Fast Fourier Transforms.

For a better understanding of the present invention, and to show how the same may be carried into effect, reference will now be made, by way of example, to the accompanying drawings, in which: Figure 1 is a block diagram of apparatus according to a first embodiment of the invention for parallel processing data, Figure 2 is a diagram illustrating an arrangement of shift registers to achieve corner turning of data using the method according to the present invention and the apparatus of Figure 1, Figure 3 is a diagram illustrating the relative timing of the control signals used by the shift register arrangement of Figure 2, Figure 4 is a view similar to that of Figure 1 showing a block diagram of an apparatus for parallel processing data according to a second embodiment of the invention.

As shown in the accompanying drawings, the apparatus and method of the invention for parallel processing of data, such as signal andlor image data, basically involves organising the data into a two dimensional array having at least two rows and at least two transverse linking columns. In the embodiment illustrated in Figures 1 and 4 there are four such rows 5a, 5b, 5c and 5d and four such columns 6a, 6b, 6c, 6d. First high level data processing is carried out on the rows, 5a, 5b, 5c, 5d or on the columns 6a, 6b, 6c, 6d, corner turning is carried out on the first processed data to turn it from the rows into the columns or vice versa and second high level data processing is carried out on the corner turned data in the columns or in the rows.The first processed data in the rows 5a, 5b, 5c, 5d or in the columns 6a, 6b, 6c, 6d is stored before or after corner turning, in separate memories 3a, 3b, 3c, 3d associated one with each row or column.

In the embodiment illustrated in Figure 1 the first high level data processing is carried out on each of the rows 5a, 5b, 5c, 5d by one row processor la, lb, ic, ld and the second high level processing is carried out on the column ordered data by one column processor 4a, 4b, 4c, 4d. The corner turning is carried out on the processed row data by a plurality of shift registers 2a, 2b, 2c, 2d located respectively one in each column 6a, 6b, Sc, 6d. The processed row data is stored in separate memories 3a, 3b, 3c, 3d associated one with each column, after corner turning.Although in the illustrated embodiments of Figures 1 and 4 four rows and four columns have been shown, it is of course to be understood that the method and apparatus of the invention is operable with at least two rows and at least two columns.

Corner turning is carried out by feeding the processed data from each row 5a, 5b, 5c, 5d in sequence, in parallel into the associated shift register associated one with each column to form a series of data sets. The series of data sets for each shift register 2a, 2b, 2c, 2d is shifted into the associated memory 3a, 3b, 3c, 3d in column order, from whence the column ordered data can be read by the associated column processor 4a, 4b, 4c, or 4d.

The row and column processors have a high functionality, performing complete operations on segments of data (for example 256 samples) rather than elementary operations on single data samples. In the Figure 1 embodiment each column processor 4a, 4b, 4c, 4d has an associated memory 3a, 3b, 3c, 3d into which the corner turned data is stored prior to column processing.

Thus the total memory required to hold the data is distributed between all the column processors.

This provides a high bandwidth communications structure connecting a parallel array of row processors with a concurrently operating array of column processors. Extremely high performance may be obtained without communication bottlenecks, with the addition of further rows and columns and thus further row processors and column processors, automatically increasing the data inputloutput bandwidth. One dimensional data may be processed by organising it in a two dimensional form prior to processing. Data in three or more dimensions may be rocessed by first organising the data into two dimensional arrays of data.

Although not illustrated, the first high level processing could be carried out on each of the columns of data, the corner turning carried out on the processed column data to turn it into row ordered data and the second high level processing carried out on the row ordered data. In other words the sequence of Figure 1 in which data is inputted at 7 and outputted at 8 could be reversed. In such an alternative, the first high level processing would be carried out by the column processors, the second high level processing carried out by the row processors and the processed column data stored, after corner turning, in the separate memories associated one with each row.The Figure 4 embodiment illustrates such alternative apparatus in which the memories are associated with the row processors although in the illustrated Figure 4 embodiment the data input 7 is to the rows and the data output 8 is from the columns.

Example 1 The example algorithm used in the method of the invention is the two dimensional Fast Fourier Transform (FFT). This is a well known algorithm which may be implemented by first applying a one dimensional FFT to all the rows (5a, 5b, 5c, 5d) of the two dimensional data array followed by applying a one dimensional FFT to all the columns (6a, 6b, 6c, 6d) of the resultant data array. In this example a 64 by 64 point array of data as shown in Table 1 is to be transformed by processor apparatus according to the first embodiment of the invention as illustrated in Figure 1.

In this particular case the row processors la, lb, 1c, ld and the column processors 4a, 4b, 4c, 4d all perform identical functions which is a 64 point one dimension FFT.

n, TABLE 1. <img class="EMIRef" id="027178134-00100001" />

<tb>

0.0 <SEP> 0,1 <SEP> 0,2 <SEP> 0.3 <SEP> 0.4 <SEP> 0.5 <SEP> 0,5 <SEP> <tb> <SEP> 1 <tb> 1.0 <SEP> 1,1 <SEP> l <SEP> 1,5 <SEP> jIi.s <SEP> 1.4 <SEP> 1,5 <SEP> 1.5 <tb> 2,0 <SEP> 2,1 <SEP> 2.2 <SEP> 2.3 <SEP> 2.4 <SEP> 2.5 <tb> 3.0 <SEP> 3,1 <SEP> 3,2 <SEP> 3.3 <SEP> 3.4 <SEP> 3.5 <tb> <SEP> 4.0 <SEP> 4,1 <SEP> 4,2 <SEP> 4,3 <SEP> 4,4 <SEP> 4,5 <SEP> ... <SEP> .. <tb>

5.0 <SEP> 5.1 <tb> <SEP> J <tb> <SEP> 6,0 <SEP> 6.1 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <tb>

7.0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <tb>

8,0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <SEP> <tb>

<SEP> <tb> <SEP> ... <SEP> ... <SEP> . <tb>

60.0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <SEP> <tb>

61.0 <SEP> 61.1 <SEP> .. <SEP> ... <SEP> -.. <tb>

62.0 <SEP> <SEP> 62.1 <SEP> 62.2 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> 1. <SEP> <tb>

<SEP> I- <SEP> <tb> 63.0 <SEP> 63.1 <SEP> 63.2 <SEP> 63.3 <SEP> 63.4 <SEP> 63.5 <SEP> ... <SEP> <SEP> .. <SEP> <tb>

The data was processed four rows at a time. The first four rows of data were passed through the four row processors la, lb, 1c, ld which perform 64 point FFTs on the data rows 5a, 5b, 5c, 5d respectively. The row processors output their results in the same order as the data went in. The first set of data to emerge was (0,0) from row processor la, (1,0) from row processor lb, (2,0) from row processor 1c and (3,0) from row processor ld. This set of data was loaded in parallel into the first shift register 2a, then shifted out and placed in memory 3a.The next set of data from the row processors [ (0,1) (1,1) (2,1) (3,1) ] was loaded onto the next shift register 2b, then shifted out into memory 3b. In a similar way memory 3c will receive the data [ (0,2) (1,2) (2,2) (3,2) ] and memory 3d will receive data [ (0,3) (1,3) (2,3) (3,3) ] . The next set of data to emerge from the row processors, [ (0,4) (1,4) (2,4) (3,4) ] was loaded by the first shift register 2a into memory 3a.After rows 5a to 5d had been processed the next four rows were processed starting at 4,0 then 5,0 then 6,0 and 7,0. This procedure continued with the row processors processing each set of four rows of data in turn until the last set of data [ (60,63) (61,63) (62,63) (63,63) ] had been loaded into memory 3d.

Now memory 3a contains all the data from every fourth column of the data array starting at column 6a (i.e. columns 1, 5, 9....) and memories 3c and 3d contain all the data from every fourth column starting at columns 6c and 6d respectively.

TABLE 2 <img class="EMIRef" id="027178134-00120001" />

<SEP> 0,0 <SEP> P <tb> ~00 <SEP> 10,1 <SEP> 0,2 <SEP> 10.3 <SEP> 0,4 <SEP> 0,5 <SEP> ... <SEP> .. <SEP> ~ <tb> <SEP> 1,0 <SEP> 1,1 <SEP> 1,2 <SEP> 1,3 <SEP> 1,4 <SEP> 1,5 <SEP> ... <SEP> . <tb>

<SEP> I.o <SEP> I <tb> <SEP> 1 <tb> <SEP> 2,2 <SEP> 2," <tb> <SEP> 2,0 <SEP> |2,1 <SEP> 2,2 <SEP> 2,3 <SEP> 2,4 <tb> <SEP> 3#01 <tb> <SEP> 1 <tb> <SEP> 3,0 <SEP> 13,1 <SEP> 3,2 <SEP> 13z3 <SEP> 3,4 <SEP> 3,5 <SEP> ... <SEP> <tb>

<SEP> 4,0 <SEP> <SEP> |4,1 <SEP> 4,2 <SEP> 4,3 <SEP> <SEP> 4,4 <SEP> <SEP> 4,5 <SEP> ... <SEP> <tb>

<SEP> 5,0 <SEP> |5,1 <SEP> 5,2 <SEP> ... <SEP> ... <SEP> .... <SEP> <tb>

<SEP> 6,0 <SEP> 6,1 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <SEP> <tb>

<SEP> 6,0 <tb> <SEP> 5,2 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <tb>

<SEP> I <tb> <SEP> 7.0 <SEP> I-~ <tb> <SEP> 8.0 <SEP> 1"~ <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> i. <tb>

<SEP> 7.0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <SEP> <tb>

8,0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> .. <SEP> <tb>

<SEP> I <tb> <SEP> I <SEP> | <SEP> 1-- <SEP> @ <SEP> w <tb> <SEP> 60.0 <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> ... <SEP> r <tb> <SEP> 61,0 <SEP> 61.1 <SEP> ... <SEP> ... <SEP> ... <SEP> ''' <SEP> ... <SEP> .. <SEP> <tb>

<SEP> 62,0 <SEP> 62,1 <SEP> 62,2 <SEP> ... <tb>

<SEP> 63.0 <SEP> 63.1 <SEP> 63.2 <SEP> 63.3 <SEP> 63.4 <SEP> 63.5 <SEP> ... <SEP> ~ <tb> The column processors 4a, 4b, 4c, and 4d can now read the column orientated data out of the memories 3a, 3b, 3c and 3d respectively and process each column in turn. The column processors 4a, 4b, 4c and 4d first process columns 6a, 6b, 6c, 6d (0, 1, 2 and 3) respectively, followed by successive columns (4, 5, 6 and 7) and so on until all the columns of data have been processed. The column processors will perform 64 point FFTs on each column of data in the example two dimensional FFT algorithm. The data from the processing apparatus appears in column order at the output of the column processors. If desired a further parallel processing apparatus may be added to the output of the column processors 4a, 4b, 4c, 4d to convert the column ordered data back to row ordered form.

By using a shift register structure to perform the corner turning the memory elements required to hold the corner turned data before column processing are distributed evenly between the four column processors 4a, 4b, 4c, 4d. The four memories 3a, 3b, 3c, 3d are accessed concurrently, thereby improving data throughput compared with a conventional single memory arrangement.

Further to illustrate the method of the present invention a specific implementation of the shift register structure will now be described. With reference to Figure 2 an array of four-bit shift registers were connected to the outputs of the row processors la, Ib, 1c, ld and to the inputs of the column memories 3a, 3b, 3c, 3d. The data output from the row processor la is represented by A0, al, A2,... where AO is the least significant bit of the data word, Al the next bit and so on. Similarly BO, B1, B2,.... is the output from row processor lb; CO, C1...

the output from row processor lc and DO, D1.... the output from row processor ld.

The input to column memory 3a is represented by EO, El, E2,.... where EO is the least significant bit, El the next bit and so on. FO, Fl ; GO, Gl and HO, H1,.... are the inputs to column memories 3b, 3c and 3d respectively.

Each four bit shift register is controlled by two signals, LD and SH. LD causes the data at the parallel input (PO, P1, P2, P3) of the shift register to be parallel loaded into the register.

SH causes the data within the shift register to be shifted down one position. The serial (shifted) data appears at the output, SOUT, of the shift register. Each vertical bank of shift registers in Figure 2 have common LD and SH control signals.

For example the first bank (column 2a) which generates the corner turned signals EO, El for column memory 3a uses the signals SHE and LDE. The relative timing of the shift register control signals is shown in Figure 3.

The operation of the shift register "corner turning" structure will now be described. As the first set of data emerges from the row processors la, lb, ic, ld during clock period TO (see Figure 3) the LDE (load shift register 2a) signal is activated. This causes the data from all the row processors to be loaded into the first column of shift registers (column 28 in Figure 2). If a 16 bit word is used as the output from each row processor then there will be sixteen shift registers in the column, i.e. one shift register for each bit. Since there are four row processors la, lb, 1c, ld in this example each shift register 2a, 2b, 2c, 2d will be four bits long.Once the data has been loaded into the first column of shift registers (column 2a) the data from row processor la (AO, A1,....) is immediately available at the outputs of those shift registers (EO, El E1,....).

During the next clock period T1 the next set of data emerges from the row processors and is loaded into the second column of shift registers (column 2b) by signal LDF. At the same time the SHE line is activated shifting the data in the first column of shift registers (column 2a) down one place so that the data previously loaded from row processor lb is available at their outputs.

On the next clock pulse (T2) the third set of data from the row processor is loaded into the third column of shift registers (column 2c) by signal LDG. Signals SHE and SHF cause the data in the first and second column of shift registers (columns 2a and 2b) respectively) to be shifted down one place.

Now data CO, Cl loaded in time slot TO is available at the output of the first column of shift registers (column 2a), data BO, Bl (loaded in time slot T2) is available at the output of the third column of shift registers (column 2c).

This procedure continues with data from the row processors being loaded into one column of shift registers while the data in the other three columns of shift registers is shifted down one place. The output from the shift registers constitutes the required corner turned data for each column processor 4a, 4b, 4c, 4d which is loaded into its associated memory 3a, 3b, 3c, 3d.

Although the foregoing Example 1 has been described in terms of the apparatus for parallel processing of data according to the embodiment of Figure 1, it is to be understood that a similar method can be carried out with the apparatus for the parallel processing of data as illustrated in the second embodiment of Figure 4.The primary difference between the two embodiments is that in the embodiment of Figure 4 the memories 3a, 3b, 3c and 3d are associated with the row processors la, Ib, 1c and ld. Additionally although in the two illustrated embodiments the data input has been shown as to the row processors la, lb, 1c and ld, with the output from the column processors 4a, 4b, 4c and 4d, it is, however, to be understood that the data input could be to the column processors and the data output from the row processors.

Additionally, although four rows 5a, 5b, 5c and 5d and four columns 6a, 6b, 6c and 6d have been described and illustrated with respect to the embodiments of Figures 1 and 4 a minimum of two such rows and two such columns may be provided or more than four such rows and columns if desired.

In any event each row will include one row processor and each column will include one shift register and column processor.

One memory will be provided for each column or row. The output ends of the rows 5a, 5b, 5c and 5d are connected, in the illustrated embodiments, to the shift register 2d of the column 6d. The rows are also connected intermediate the row ends at specific spacings there along to the shift register 2a of the column 6a, to the shift register 2b of the column 6b and to the shift register 2c of the column 6c. In the Figure 1 embodiment the memories 3a, 3b, 3c and 3d are connected respectively one between each of the shift registers and the column processors.

In the Figure 4 embodiment the memories are connected one between each of the row processors and the first column connection 6a. Each row processor and each column processor is operable to carry out one dimensional Fast Fourier Transforms.

The column processors or row processors each or all may be capable of performing one specific function only, which preferably may be selected from several possible predefined modes of operation.

Claims (20)

1. A method of parallel processing data, in which the data is organised into a two dimensional array having at least two rows and at least two transverse linking columns, first high level data processing is carried out on the rows or on the columns, corner turning is carried out on the first processed data to turn it from said rows into said columns or vice versa, and second high level data processing is carried out on the corner turned data in said columns or in said rows, with the first processed data in said rows or columns being stored, before or after corner turning, in separate memories associated one with each row or column.
2. A method according to Claim 1, in which said first high level data processing is carried out on each of said rows of data, the corner turning is carried out on the processed row data to turn it into column ordered data and said second high level data processing is carried out on the column ordered data.
3. A method according to Claim 2, in which said first high level processing is carried out by one row processor per row, said second high level processing is carried out by one column processor per column and the processed row data is stored in said separate memories associated one with each row, before corner turning.
4. A method according to Claim 2, in which said first high level processing is carried out by one row processor per row, said second high level processing is carried out by one column processor per column and the processed row data is stored in separate memories associated one with each column, after corner turning.
5. A method according to Claim 4, in which corner turning is carried out by feeding the processed data from each row in sequence, in parallel into a shift register associated one with each column to form a series of data sets and shifting the series of data sets from each shift register into the associated memory in column order, from whence the column ordered data can be read by the associated column processor.
6. A method according to Claim 1, in which said first high level processing is carried out on each of said columns of data, the corner turning is carried out on the processed column data to turn it into row ordered data and said second high level processing is carried out on the row ordered data.
7. A method according to Claim 6, in which said first high level processing is carried out by one row processor per row and the processed column data is stored after corner turning in said separate memories associated one with each row.
8. A method according to Claim 7, in which the corner turning is carried out by feeding the processed data from each column in sequence, in parallel into a shift register associated one with each row to form a series of data sets from each shift register into the associated memory in row order, from whence the row ordered data can be read by the associated row processor.
9. A method according to Claim 3, Claim 4 or Claim 7, in which one dimensional Fast Fourier Transforms are carried out on the data in each processor.
10. A method according to any one of Claims 1 to 9, in which one dimensional data is processed by first organising it into a two dimensional array.
11. A method according to any one of Claims 1 to 10, in which the data to be parallel processed is signal andlor image data.
12. A method according to any one of Claims 1 to 9, in which data in three or more dimensions is processed by first organising it into two dimensional arrays of data.
13. A method of parallel processing data substantially as hereinbefore described with reference to Figures 1 to 3 or Figure 4 of the accompanying drawings.
14. Apparatus for the parallel processing of data, including means for organising data into a two dimensional array having at least two rows and at least two transverse linking columns, first processing means for carrying out first high level data processing on the rows or the columns, corner turning means for carrying out corner turning on the first processed data to turn it from said rows into said columns or vice versa, second processing means for carrying out second high level processing on the corner turned data in said columns or in said rows, and at least two separate memories associated one with each row or column, which memories are located and operable to store the first processed data in said rows or columns before or after corner turning.
15. Apparatus according to Claim 14, wherein the first and second processing means are data processors located one in each row and column and wherein the corner turning means includes a plurality of shift registers located one in each column.
16. Apparatus according to Claim 15, wherein the array has at least two substantially parallel rows, with the first processing means data processors being located respectively one at each input end of each row, with the output end of each row being connected to the shift register of one column and with the rows being connected intermediate the row ends to the shift register of another column, and wherein the second processing means data processors are located respectively one at each output end of each column to receive the output from the associated shift register.
17. Apparatus according to Claim 16, wherein the memories are located one in each row between the associated row data processor and the row connections to the column shift register most remote from the output ends of the rows.
18. Apparatus according to Claim 16, wherein the memories are located one in each column between the associated column data processor input and the associated column shift register output.
19. Apparatus according to any one of Claims 15 to 18, wherein each data processor is operable to carry out one dimensional Fast Fourier Transforms.
20. Apparatus for the parallel processing of data, substantially as hereinbefore described and as illustrated in Figure 1 or Figure 4 of the accompanying drawings.
GB9020776A 1989-11-08 1990-09-24 Method and apparatus for parallel processing data Expired - Fee Related GB2237908B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB898925227A GB8925227D0 (en) 1989-11-08 1989-11-08 Method and apparatus for parallel processing data
GB9020776A GB2237908B (en) 1989-11-08 1990-09-24 Method and apparatus for parallel processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB9020776A GB2237908B (en) 1989-11-08 1990-09-24 Method and apparatus for parallel processing data

Publications (3)

Publication Number Publication Date
GB9020776D0 GB9020776D0 (en) 1990-11-07
GB2237908A true GB2237908A (en) 1991-05-15
GB2237908B GB2237908B (en) 1993-06-16

Family

ID=26296178

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9020776A Expired - Fee Related GB2237908B (en) 1989-11-08 1990-09-24 Method and apparatus for parallel processing data

Country Status (1)

Country Link
GB (1) GB2237908B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004040456A2 (en) * 2002-10-28 2004-05-13 Quicksilver Technology, Inc. Distributed data cache architecture
US7653710B2 (en) 2002-06-25 2010-01-26 Qst Holdings, Llc. Hardware task manager
US7660984B1 (en) 2003-05-13 2010-02-09 Quicksilver Technology Method and system for achieving individualized protected space in an operating system
US7668229B2 (en) 2001-12-12 2010-02-23 Qst Holdings, Llc Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7752419B1 (en) 2001-03-22 2010-07-06 Qst Holdings, Llc Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US7809050B2 (en) 2001-05-08 2010-10-05 Qst Holdings, Llc Method and system for reconfigurable channel coding
US7865847B2 (en) 2002-05-13 2011-01-04 Qst Holdings, Inc. Method and system for creating and programming an adaptive computing engine
US7937539B2 (en) 2002-11-22 2011-05-03 Qst Holdings, Llc External memory controller node
US7937591B1 (en) 2002-10-25 2011-05-03 Qst Holdings, Llc Method and system for providing a device which can be adapted on an ongoing basis
USRE42743E1 (en) 2001-11-28 2011-09-27 Qst Holdings, Llc System for authorizing functionality in adaptable hardware devices
US8108656B2 (en) 2002-08-29 2012-01-31 Qst Holdings, Llc Task definition for specifying resource requirements
US8225073B2 (en) 2001-11-30 2012-07-17 Qst Holdings Llc Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements
US8250339B2 (en) 2001-11-30 2012-08-21 Qst Holdings Llc Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US8276135B2 (en) 2002-11-07 2012-09-25 Qst Holdings Llc Profiling of software and circuit designs utilizing data operation analyses
US8543794B2 (en) 2001-03-22 2013-09-24 Altera Corporation Adaptive integrated circuitry with heterogenous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6836839B2 (en) 2001-03-22 2004-12-28 Quicksilver Technology, Inc. Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7403981B2 (en) 2002-01-04 2008-07-22 Quicksilver Technology, Inc. Apparatus and method for adaptive multimedia reception and transmission in communication environments

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589660B2 (en) 2001-03-22 2013-11-19 Altera Corporation Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US9665397B2 (en) 2001-03-22 2017-05-30 Cornami, Inc. Hardware task manager
US9037834B2 (en) 2001-03-22 2015-05-19 Altera Corporation Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US9396161B2 (en) 2001-03-22 2016-07-19 Altera Corporation Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US8543794B2 (en) 2001-03-22 2013-09-24 Altera Corporation Adaptive integrated circuitry with heterogenous and reconfigurable matrices of diverse and adaptive computational units having fixed, application specific computational elements
US7752419B1 (en) 2001-03-22 2010-07-06 Qst Holdings, Llc Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
US9015352B2 (en) 2001-03-22 2015-04-21 Altera Corporation Adaptable datapath for a digital processing system
US7822109B2 (en) 2001-05-08 2010-10-26 Qst Holdings, Llc. Method and system for reconfigurable channel coding
US8249135B2 (en) 2001-05-08 2012-08-21 Qst Holdings Llc Method and system for reconfigurable channel coding
US7809050B2 (en) 2001-05-08 2010-10-05 Qst Holdings, Llc Method and system for reconfigurable channel coding
USRE42743E1 (en) 2001-11-28 2011-09-27 Qst Holdings, Llc System for authorizing functionality in adaptable hardware devices
US9594723B2 (en) 2001-11-30 2017-03-14 Altera Corporation Apparatus, system and method for configuration of adaptive integrated circuitry having fixed, application specific computational elements
US8225073B2 (en) 2001-11-30 2012-07-17 Qst Holdings Llc Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements
US9330058B2 (en) 2001-11-30 2016-05-03 Altera Corporation Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US8250339B2 (en) 2001-11-30 2012-08-21 Qst Holdings Llc Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US7668229B2 (en) 2001-12-12 2010-02-23 Qst Holdings, Llc Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US8442096B2 (en) 2001-12-12 2013-05-14 Qst Holdings Llc Low I/O bandwidth method and system for implementing detection and identification of scrambling codes
US7865847B2 (en) 2002-05-13 2011-01-04 Qst Holdings, Inc. Method and system for creating and programming an adaptive computing engine
US8200799B2 (en) 2002-06-25 2012-06-12 Qst Holdings Llc Hardware task manager
US7653710B2 (en) 2002-06-25 2010-01-26 Qst Holdings, Llc. Hardware task manager
US8782196B2 (en) 2002-06-25 2014-07-15 Sviral, Inc. Hardware task manager
US10185502B2 (en) 2002-06-25 2019-01-22 Cornami, Inc. Control node for multi-core system
US8108656B2 (en) 2002-08-29 2012-01-31 Qst Holdings, Llc Task definition for specifying resource requirements
US7937591B1 (en) 2002-10-25 2011-05-03 Qst Holdings, Llc Method and system for providing a device which can be adapted on an ongoing basis
US8380884B2 (en) 2002-10-28 2013-02-19 Altera Corporation Adaptable datapath for a digital processing system
WO2004040456A3 (en) * 2002-10-28 2005-08-04 Quicksilver Tech Inc Distributed data cache architecture
US7904603B2 (en) 2002-10-28 2011-03-08 Qst Holdings, Llc Adaptable datapath for a digital processing system
US8706916B2 (en) 2002-10-28 2014-04-22 Altera Corporation Adaptable datapath for a digital processing system
WO2004040456A2 (en) * 2002-10-28 2004-05-13 Quicksilver Technology, Inc. Distributed data cache architecture
US8276135B2 (en) 2002-11-07 2012-09-25 Qst Holdings Llc Profiling of software and circuit designs utilizing data operation analyses
US7941614B2 (en) 2002-11-22 2011-05-10 QST, Holdings, Inc External memory controller node
US7937539B2 (en) 2002-11-22 2011-05-03 Qst Holdings, Llc External memory controller node
US7979646B2 (en) 2002-11-22 2011-07-12 Qst Holdings, Inc. External memory controller node
US7984247B2 (en) 2002-11-22 2011-07-19 Qst Holdings Llc External memory controller node
US7937538B2 (en) 2002-11-22 2011-05-03 Qst Holdings, Llc External memory controller node
US7660984B1 (en) 2003-05-13 2010-02-09 Quicksilver Technology Method and system for achieving individualized protected space in an operating system

Also Published As

Publication number Publication date
GB9020776D0 (en) 1990-11-07
GB2237908B (en) 1993-06-16

Similar Documents

Publication Publication Date Title
EP0097834B1 (en) Circuits for accessing a variable width data bus with a variable width data field
EP0248906B1 (en) Multi-port memory system
US5175862A (en) Method and apparatus for a special purpose arithmetic boolean unit
US4380046A (en) Massively parallel processor computer
DE69535672T2 (en) Synchronous NAND DRAM architecture
US4056845A (en) Memory access technique
US4866603A (en) Memory control system using a single access request for doubleword data transfers from both odd and even memory banks
CA1175576A (en) Data processing system for vector operations
US3644906A (en) Hybrid associative memory
JP2576827B2 (en) Dual port computer memory device, access method, computer memory devices, and memory structures
DE69830962T2 (en) The semiconductor integrated circuit
US4215401A (en) Cellular digital array processor
EP1981030B1 (en) Daisy chain cascading devices
US4727474A (en) Staging memory for massively parallel processor
DE3724317C2 (en)
US5367494A (en) Randomly accessible memory having time overlapping memory accesses
EP0068764A2 (en) Vector processing units
US5956274A (en) Memory device with multiple processors having parallel access to the same memory area
US6061779A (en) Digital signal processor having data alignment buffer for performing unaligned data accesses
EP0263924B1 (en) On-chip bit reordering structure
US5283877A (en) Single in-line DRAM memory module including a memory controller and cross bar switches
US5111192A (en) Method to rotate a bitmap image 90 degrees
US4989180A (en) Dynamic memory with logic-in-refresh
EP0042442A1 (en) Information processing system
US5715188A (en) Method and apparatus for parallel addressing of CAMs and RAMs

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 19940924