EP0550427A1 - Datenfeld parallelverarbeitungssystem - Google Patents

Datenfeld parallelverarbeitungssystem

Info

Publication number
EP0550427A1
EP0550427A1 EP19900911629 EP90911629A EP0550427A1 EP 0550427 A1 EP0550427 A1 EP 0550427A1 EP 19900911629 EP19900911629 EP 19900911629 EP 90911629 A EP90911629 A EP 90911629A EP 0550427 A1 EP0550427 A1 EP 0550427A1
Authority
EP
European Patent Office
Prior art keywords
data
page
processor
address
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19900911629
Other languages
English (en)
French (fr)
Inventor
John Walter Neave
Neil Francis Trevett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3Dlabs Ltd
Original Assignee
DuPont Pixel Systems Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DuPont Pixel Systems Ltd filed Critical DuPont Pixel Systems Ltd
Publication of EP0550427A1 publication Critical patent/EP0550427A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Definitions

  • This invention relates to a data-array processing system employing parallel processors and a common memory for storing the data-array.
  • the invention is more particularly, but not exclusively, concerned with processing arrays of data in which the relative positions of the data elements in the array are significant in addition to the values of the data elements, for example, as in pixel or vector processing.
  • each of a group of N data elements can be processed in parallel by a respective one of N processors and that a group of N locations in the memory can be accessed in parallel by the N processors.
  • SIMD single-instruction-multiple-data
  • the present invention is concerned with freeing the system from the above described restriction, and in accordance with the invention the system includes an exchange under control of the processors for routing the data elements between any selected processor and any selected memory location in the accessed group, preferably with each processor being capable of controlling the exchange to determine the routing of its respective data element. Accordingly, any data element input to the exchange can be routed to any output of the exchange and therefore a single data element can be replicated to all positions, or the group of data elements can be shifted or translated, or the group of data elements can routed straight through the exchange, etc. all under the control the processors.
  • the processors may still be arrayed as a SIMD array, but with each selecting the routing of its data element, for example in accordance with the position of the processor in the array or in accordance with other conditions subsisting at the processor. Therefore the flexibility of the system is greatly increased.
  • the flexibility of the system is preferably further increased by arranging the exchange to be operable under control of the processors to transfer data elements between the processors.
  • one or each of some of the processors can control the exchange to route to it any selected data element supplied to the exchange by another of the processors. This therefore obviates the need during transfer of a data element from one processor to another for that data element to be written to the memory by the one processor and then to be read from the memory by the other processor.
  • the exchange comprises a plural number N of data sections, each preferably comprising an N:l multiplexer, and each associated with a respective one of the processors, with each exchange data section being responsive to a selection signal from its respective processor to route a data element of the group between that processor and any selected memory location in the accessed group.
  • each processor is preferably operable to load into a respective register a value indicating its reguired routing, the registered value being used as a control input to the respective multiplexer.
  • the registered values may be supplied to the multiplexer via a further multiplexer or switch which receives as a further input thereto a predetermined value indicative of the straight-through route.
  • each processor in order to deal with writing to some but not all of the accessed group of memory locations, each processor is operable to supply a write enable signal, and the exchange is also operable to route the write enable signal from each processor to the same selected memory location in the accessed group as is selected for the routing of the data element to control writing of the data element to that memory location.
  • the exchange may provide for this routing of the write enable signals by including a plural number N of write-enable sections each preferably comprising an N:l multiplexer, and each associated with a respective one of the processors, with each exchange write-enable section being responsive to the same selection signal as the respective exchange data section.
  • Figure 1 is a high-level schematic illustration of a computer system embodying the invention
  • FIGS 2 and 3 illustrations of modified forms of the system of Figure 1;
  • Figure 4 is an illustration in greater detail of a renderer employed in the systems of Figures 1 to 3;
  • Figure 5 is an illustration in greater detail of a front-end processor board employed in the systems of Figures 1 to 3;
  • Figure 6A and 6B show how patches of pixel data are made up
  • Figures 7A and 7B show how pages of patch data, and groups or 'superpages 1 of page data are made up;
  • Figure 8 is a schematic illustration of a physical image memory and the address lines therefor, used in the Tenderer of Figure 4;
  • Figure 9A is a 3-D representation of an aligned patch of data within a single page in the image memory
  • Figure 9B is a 2-D representation of a page, showing the patch of Figure 9A;
  • Figure 10A is a 3-D representation of a non-aligned patch of data within a single page in the image memory
  • Figure 10B is a 2-D representation of a page, showing the patch of Figure 10A;
  • Figure 11A is a 2-D representation of four pages in a virtual memory, showing a non-aligned patch which crosses the page boundaries and an enlargement of the circled part of the page boundary intersection;
  • Figure 11B is a 2-D representation of the physical memory illustrating locations of the four pages shown in Figure 11A;
  • Figure 11C is a 3-D representation of the non-aligned patch of Figure 11A;
  • Figure 12 is a truth table showing how page selection is made for patches which cross page boundaries;
  • Figure 13 shows two truth tables for selecting, respectively, X and Y patch address incrementation
  • FIG 14 is a schematic illustration in greater detail of part of the Tenderer of Figure 4.
  • Figure 15 is a schematic illustration in greater detail of an address translator of Figure 14;
  • Figure 16 is an illustration of the operation of a contents addressable memory used in the address translator of Figure 15;
  • Figure 17 is a schematic illustration in greater detail of a read surface shifter used Figure 14;
  • Figure 18 shows in greater detail an array of multiplexers forming part of the surface shifter of Figure 17;
  • Figure.19 illustrates the translation made by the surface shifter of Figure 17
  • Figure 20 is an illustration of the operation a least-recently-used superpage table which may be used with the address translator of Figure 15;
  • Figure 21 is a schematic diagram showing a page fault table which may be used with the address translator of Figure 15;
  • FIG 22 is a schematic diagram of an exchange and grid processor of the Tenderer of Figure 4.
  • Figure 23 is a flow diagram illustrating the operation of the processors and a priority encoder of the grid processor of Figure 22;
  • Figure 24 is a table giving an example of the operation of the priority encoder of Figure 22;
  • Figure 25 illustrates the correlation between aligned memory cells and two levels of a patch in a 16-bit split patch system
  • Figures 26 and 27 show how pages of patch data, and superpages of page data are made up in a 16-bit split patch system
  • Figures 28 and 29 correspond to Figures 26 and 27 respectively in an 8-bit split patch system
  • FIGS 30A to C shows modifications of parts of the address translator of Figure 15 used in the split patch system
  • Figure 31 is a table to explain the operation of a funnel shifter used in the circuit of Figure 30A;
  • Figures 32 and 33 illustrate non-aligned split patches in a 16-bit and an 8-bit patch system, respectively;
  • Figure 34 shows a further modification of part of the address translator of Figure 15 used in the split patch system
  • Figures 35A and 35B are tables which illustrate the operation of look ⁇ up tables in the circuit of Figure 34;
  • Figures 36A and 36B shows modifications of a near-page-edge table of Figure 15A used in the split patch system
  • Figure 37 illustrates, in part, a modification to the exchange and grid processor of Figure 22 used in the split patch system
  • Figures 38 and 39 are tables which illustrate the operation of further tables in a further modification of part of the address translator of Figure 15;
  • Figure 40 shov/s the further modification to Figure 15;
  • Figure 41 shows n modification to Figure 8 which is made in addition to the modification shown in Figure 40;
  • Figure 42 is a representation of the VRAM memory space, showing how pages of data are rendered in one section of the memory and then copied to another monitoring section of the memory;
  • Figure 43 shows a circuit for determining which pages need not be copied from the rendering section to the monitoring section and to the virtual memory
  • Figure 44 illustrates the setting and resetting of flags in a table of the circuit of Figure 43;
  • Figure 45A to 45C are flow diagrams illustrating the copying operations and Figure 45D shows the notation used in Figures 45A to 45C;
  • Figure 46 is a circuit diagram of a modification to the exchange of Figure 22;
  • Figure 47A to 47C are simplified forms of the circuit of Figure 46 when operating in three different modes
  • Figure 48 shows a modification to part of the flow diagram at Figure 23;
  • Figure 49 is a schematic diagram of the processors and a microcode memory, with one of the processors shown in detail;
  • Figures 50A to 50D illustrate three images (Figs 50A to C) which are processed to form a fourth image (Fig. 50D);
  • Figure 51 is a system diagram showing in particular a page filing system.
  • FIGS 1 to 3 show three different hardware configurations of computer systems embodying the invention.
  • a host computer 10 has its own backplane in the form of a VME bus 12 which provides general purpose communications between various circuit boards of the computer, such as processor, memory and disk controller boards.
  • VME bus 12 which provides general purpose communications between various circuit boards of the computer, such as processor, memory and disk controller boards.
  • the Tenderer 16 is connected to the VME bus 12 and the Futurebus+ 20, and also communicates with the video processor 18, which in turn drives an external colour monitor 24 having a high-resolution of, for example, 1280 X 1024 pixels.
  • the front-end board 22 is also connected to the Futurebus+ 20 and can communicate with a selection of peripherals, which are illustrated collectively by the block 26.
  • the configuration of Figure 1 is of use when the host computer 10 has a VME backplane 12 and there is sufficient room in the computer housing 14 for the Tenderer 16, video processor 18, Futurebus ⁇ 20 and front-end board 22, and may be used, for example, with a 'Sun Workstation'.
  • Figure 2 In the case where the computer housing 14 is physically too small, or where the host computer 10 does not have a VME or Futurebus+ backplane, the configuration of Figure 2 may be employed.
  • a separate housing 28 is used for the Tenderer 16, video processor 18, front- end board 22 and Futurebus+ 20, as described above, together with a VME bus 12 and a remote interface 30.
  • a host interface 32 is connected to the backplane 34 of the host computer 10, which may be of VME, Qbus, Sbus, Multibus II, MCA, PC/AT, etc. format.
  • the host interface 32 and remote interface 30 are connected by an asynchronous differential bus 36 which provides reliable communication despite the physical separation of the host and remote interfaces.
  • Figure 2 The configuration of Figure 2 is appropriate when the host computer 10 is, for example, an 'Apple Mackintosh', 'Sun Sparkstation', 'IBM-PC, or Du Pont Pixel Systems bRISC.
  • the configuration of Figure 3 may be employed.
  • the renderer 16 and the front-end board 22 are directly connected to the Futurebus+ backplane 20.
  • the host computer 10 supplies data in the form of control information, high level commands and parameters therefor to the renderer 16 via the VME backplane ( Figure 1), y a the backplane 34, host and remote interfaces 32, 30 and the VME bus 12 ( Figure 2), or via the Futurebus+ backplane 20 ( Figure 3). Some of this data may be forwarded to the front-end board via the Futurebus+ 20 ( Figures 1 and 2), or sent direct via the Futurebus+ backplane 20 ( Figure 3) to the front-end board 22.
  • the Futurebus+ 20 serves to communicate between the renderer 16 and the front-end processor 22 and is used, in preference to a VME bus or the like, in view of its high bit width of 128 bits and its high bandwidth of about 500 to 800 Mbytes/s.
  • the renderer 16 includes an image memory, part of which is mapped to the monitor 24 by the video processor 18, and the Tenderer serves to perform image calculations and rendering, that is the drawing of polygons in the memory, in accordance with the commands and parameter supplied by the host computer 10 or the front-end board 22.
  • the front-end board 22 serves a number of functions. It includes a large paging RAM, which also interfaces with external disk storage, to provide a massive paging memory, and pages of image data can be swapped between the paging RAM and the image memory of the renderer 16 ⁇ ia the Futurebus+ 20.
  • the front-end board also has a powerful floating-point processing section which can be used for graphics transformation and shading operations. Furthermore, the front-end board may provide interfacing with peripherals such as a video camera or recorder, monitor, MIDI audio, microphone, SCSI disk and RS 232.
  • the renderer 16, video processor 18 and front- end board 22 can accelerate pixel handling aspects of an application, and also accelerate other computation intensive aspects of an application.
  • the renderer 16 includes a 32-bit internal bus 300, a VME interface 301 which interfaces between the VME bus 12 ( Figure 1) or the remote interface 30 ( Figure 2) and the internal bus 300, and a Futurebus ⁇ interface 302 which interfaces between the Futurebus+ 20 and the internal bus 300. Also connecting to the internal bus 300 are a control processor 314 implemented by an Intel 80960i, an EPROM 303, 4 or 16 Mbyte of DRAM 304, a real time clock and an I/O block 306 including a SCSI ports.
  • a control processor 314 implemented by an Intel 80960i, an EPROM 303, 4 or 16 Mbyte of DRAM 304, a real time clock and an I/O block 306 including a SCSI ports.
  • control processor 314 and the associated DRAM 304 and EPROM 303 are (a) to boot-up and configure the system; (b) to provide resource allocation for local PRAM 318, 322 of address and grid processors 310, 312 (described in detail below) to ensure that there is no memory space collision; (c) to control the loading of microcode into microcode memories 307, 308 (described below); (d) to run application specific remote procedure calls (RPCs); and (e) to communicate via the I/O block 306 with a diagnostics port of the host computer 10 to enable diagnostics information to be displayed on the monitor 24.
  • the DRAM 304 can also be used as a secondary image page store for the VRAM 700 described below.
  • the renderer 16 also includes an address processing section 309 comprising an address broadcast bus 311 to which are connected 64 kbyte of global GRAM 316, a data/instruction cache 313 which also connects to the internal bus 300, an internal bus address generator 3_ > which also connects to the internal bus 300, an address processor 310 with 16 kbyte of local PRAM 318, and a sequencer 317 for the address processor 310 which receives microcode from a microcode memory 307.
  • the address processor 310 also connects to a virtual address bus 319.
  • the main purpose of the address processing section 309 is to generate virtual addresses which are placed on the virtual address bus under control of microcode from the microcode memory 307.
  • an address translator 740 (described in further detail below) which receives the virtual addresses on the virtual address bus 319 and translates them into physical addresses of data in the video RAM 700, if the required data is present, or interrupts the address processor 310 to cause the required data to be swapped in from the paging RAM 304 or other page stores on the external buses, if the required data is not present in the VRAM 700.
  • the renderer 16 furthermore includes a data processing section 321 which is somewhat similar to the address processing section 309 and comprises a data broadcast bus 323, to which are connected 64 kbyte of global GRAM 324, a diagnostics register 325 which also connects to the internal bus 300 and which may be used instead of the I/O block 306 to send diagnostics information to the host computer 10, an internal bus address generator 327 which also connects to the internal bus 300, a grid processor 312 having sixteen processors each with 8 kbyte of local PRAM 322, and a sequencer 329 for the grid processor 312 which receives microcode from a microcode memory 308.
  • the processors of the grid processor 312 also connect to a data bus 331.
  • the main purpose of the data processing section 321 is to receive data on the data bus 331, process the data under control of microcode from the microcode memory 308, and to put the processed data back onto the data bus 331.
  • the physical VRAM 700 connects with the data bus 331 via an exchange 326 which is described in detail below, but which has the main purposes of shuffling the order of the sixteen pixels read from or written to the VRAM 700 at one time, as desired, to enable any of the sixteen processors in the grid processor 312 to read from or write to any of the sixteen addressed locations in the VRAM 700 and to enable any of the sixteen processors to transfer pixel data to any other of the sixteen processors.
  • the last main element of the renderer 16 is a bidirectional FIFO 332 connecting between the broadcast buses 311, 323 of the address and data processing sections 309, 321. which enables virtual addresses to be transferred directly between thes> wo sections.
  • the front-end board 22 will now be described in greater detail with reference to Figure 5.
  • the front-end board 22 has an internal bus 502 which communicates with the Futurebus+ 20.
  • a paging memory section 504 is connected to the internal bus 502 and comprises a large paging RAM 506 of, for example, 4 to 256 Mbytes capacity which can be used in conjunction with the DRAM 304 of the Tenderer, a paging memory control processor 508, and connections to, for examj: , two external high speed IPI-2 disk drives 510 (one of which is shown) each of which may have a capacity of, for example, 4 Gbytes, and a data communication speed of 50 Mbytes/s, or two external SCSI drives.
  • the paging RAM 506 enables an extremely large amount of pixel data to be stored and to be available to be paged into the renderer 32 as required, and the fast disk 510 enables even more pixel data to be available ready to be transferred into the paging RAM 506.
  • Floating point processing is provided by 1 to 4 Intel 80860 processors 516, each rated at 80 MFlops peak.
  • the general purpose processing power can be used on dedicated tasks such as geometric pipeline processing, or to accelerate any part of an application which is compute-intensive, such as floating point fast Fourier transforms.
  • Each of the floating point processors 516 has a 128KByte secondary cache memory 518 in addition to its own internal primary cache memory.
  • the front-end board 22 may also, if desired, include a broadcast standard 24-bit frame grabber connected to the internal bus 502 and having a video input 514 and output 516 for connection to video camera or television-type monitor.
  • the front-end board 22 may also, if desired, include an input/output processor 520 which provides interfacing with MIDI on line 522, SCSI disk on line 524, at least one mouse on line 526, RS232 on line 528, and audio signals on line 530 via a bi-directional digital/analogue convertor 532.
  • an input/output processor 520 which provides interfacing with MIDI on line 522, SCSI disk on line 524, at least one mouse on line 526, RS232 on line 528, and audio signals on line 530 via a bi-directional digital/analogue convertor 532.
  • the VRAM has a of 16 Mbyte capacity.
  • pixels are arranged in 4 x 4 groups referred to as 'patches'.
  • Figures 6A and 6B show, respectively, two-and one-dimensional notations for designating a pixel in a patch, as will be used in the following description.
  • the patches are arranged in 32 x 32 groups referred to as 'pages'.
  • the pages are arranged in 4 x 4 groups referred to as 'superpages 1 .
  • the VRAM therefore has a capacity of 4 Mpixels, or 256k patches, or 256 complete pages, or 16 complete superpages.
  • not all pages of a particular superpage need be stored in the memory at any one time, and support is provided for pages from parts of up to 128 different superpages to be stored in the physical memory at the same time.
  • Each small cube 702 in Figure 8 represents a 32-bit pixel.
  • the pixels are arranged in 512 pixel x 512 pixel banks B(0) - B(15) lying in the XY plane, and these pixel banks are 16 pixels deep (in the P direction).
  • a line of 16 pixels in the P direction provides an aligned patch 704.
  • the pixels in each bank are addressable as to X address by a respective one of 16 9-bit X address lines AX(0) to AX(15) and are addressable as to Y address by a respective one of 16 9-bit Y address lines AY(0) to AY(15).
  • the Y and X addresses are sequentially supplied on a common set of 16 9-bit address lines A(0) to A(15), with the Y addresses being supplied first and latched in a set of 16 9-bit Y latch groups 706-0 to 706-15 each receiving a row address strobe (RAS) signal on 1-bit line 708, and the X addresses then being supplied and latched in a set of 16 9-bit X latch groups, 707-0 to 707-15 each receiving a respective column address strobe signal CAS(0) to CAS(15) on lines 709(0) to 709(15), respectively.
  • RAS row address strobe
  • Each Y latch group and X latch group comprises eight latches (shown in detail for Y latch group 706(1) and X latch group 707(1) and a respective one of the X and Y latches is provided on each VRAM chip 710.
  • the banks of memory will sometimes be referred to by the bank number B(0) to B(15) and at other times by a 2-dimensionai bank address (bx,by) with the correlation between the two being as follows:
  • a patch of 16 pixels is made available for reading or writing at one time. If the Y address and X address for all of the VRAMs 710 Is the same, then an "aligned" patch of pixels (such as patch 704) will be accessed. However, it is desirable that access can be made to patches of sixteen pixels which are not aligned, but where various pixels in the patch to be accessed are derived from two or four adjacent aligned patches.
  • the pixels in the aligned patch all have the same address in the sixteen XY banks of the memory, as represented in Figure 9A, and when displayed would produce a 4 X 4 patch of pixels offset from the page boundaries by an integral number of patches, as represented in Figure 9B.
  • a further problem which arises in accessing a non-aligned patch "p" is that the (x,y) address of each pixel in the patch "p" does not correspond to the bank address (bx,by) in the memory from which that pixel is derived.
  • the following pixel derivations and translations are required.
  • Figure 11A represents four contiguous pages A, B, C, D in the virtual address space.
  • these pages may be scattered at, for example, page addresses (8,6), (4,8), (12, 12) and (6,10) in the VRAM, as represented in Figure 11B.
  • page addresses (8,6), (4,8), (12, 12) and (6,10) in the VRAM as represented in Figure 11B.
  • the non-aligned patch "p" may extend into page B, page C or pages B, C and D, depending on the direction of the misalignment.
  • the following table sets out, for each of the pixels in the patch "p" to be accessed: the page and patch address of the aligned patch from which that pixel is derived; the translation necessary from the patch address of the basic patch "a" in page A to the patch address of the patch from which the pixel is derived; the bank address from which the pixel is derived; and the translation necessary from this latter address to the address of the pixel in the patch "p".
  • FIG. 11C A representation of the locations of the pixels in the four aligned patches is shown in Figure 11C.
  • the basic patch "a” has a patch address (px,py) of (31,31) and the non-aligned patch “p” to be accessed has a misalignment (mx, my) of (2,1) relative to the basic patch "a".
  • the increment is calculated using modular arithmetic of base 32. It is also to be noted that for all pixels where (mx, my) ⁇ > (0,0), a translation of (-mx, -my) is required between the bank address (bx,by) from which the pixel is derived and the address (x,y) of the pixel in the non- aligned patch "p".
  • the VRAM 700 is addressed by the address processor 310 via the address translator 740, communicates data with the grid processor 312 via the exchange 326 and provides data to the video processor 34.
  • a greater degree of detail of the address translator, VRAM and exchange is shown in Figure 14.
  • the address translator 740 receives a 48-bit virtual address on bus 319 of a patch origin address. The translator determines whether the required page(s) to access the addressed patch are resident in the VRAM physical memory 700. If not, a page or superpage fault is flagged on line 748, as will be described in detail below. However, if so, the address translator determines the addresses in the sixteen XY banks of the physical memory of the sixteen pixels making up the patch, and addresses the memory 700 firstly with the Y addresses on the sixteen sets of 9-bit lines A(0) to A(15) and then with the X addresses on these lines. The X and Y addresses are generated under control of the X/Y select signal on line 713.
  • the exchange 326 includes a read surface shifter 742 and a write surface shifter 744. Pixel data is transferred, during a read operation, from the memory 700 to the read surface shifter 742 by a set of sixteen 32-bit data lines D"(0) to D"(15), and, during a write operation, from the write surface shifter 744 to the memory 700 by the same data lines D"(0) to D"(15).
  • the read and write surface shifters 742, 744 receive 4-bit address data from the address translator on line 770, consisting of the least significant two bits of the X and Y address data. This data represents the misalignment (mx, my) of the accessed patch "p" from the basic aligned patch "a".
  • the purpose of the surface shifters is re-order the pixel data in non-aligned patches, that is to apply the translation (-mx, -my) when reading and an opposite translation (mx, my) when writing.
  • Pixel data to be written is supplied by a crossbar 327 forming part of the exchange 326 to the write surface shifter 744, and pixel data which has been read is supplied by the read surface shifter 742 to the crossbar 327, on the 512-bit line 750 made up of a set of 16 32-bit lines.
  • the write surface shifter also receives on line 745 16-bit write enable signals WE(0) - WE(15) from the crossbar 327 one for each pixel, and the write surface shifter 744 re-organises these signals in accordance with the misalignment (mx, my) of ⁇ o patch "p" to be accessed to provide the sixteen rolumn write enable signals WE"(0) to WE”(15). Each of these signals is then ANDed with a common CAS signal on line 709 to form sixteen CAS signals CAS(0) to CAS(15), one for each of the sixteen banks of memory. This enables masking of pixels within a patch during writing, taking into account any misalignment of the patch.
  • the address translator 740 will now be described in more detail primarily with reference to Figure 15.
  • the translator 740 includes as shown, a contents addressable memory (CAM) 754, a page address table 756, a near- page-edge table 758, and X and Y incrementers 760X, 760Y.
  • the translator 740 also includes sixteen sections 764(0) to 764(15), one for each output address line A(0) to A(15), and thus for each memory bank B(0) to B(15).
  • the translator 740 receives a 48-bit virtual address of the origin (0,0) pixel of a patch on the bus 319. It will therefore be appreciated that up to
  • the bits identifying the superpage are supplied to the CAM 754.
  • the CAM 754 is an associative memory device which compares the incoming 30-bit word with all of the words held in its memory array, and if a match occurs it outputs the location or address in the memory of the matching value on line 767.
  • the CAM 754 has a capacity of 128 32-bit words. Thirty of these bits are used to store the virtual address of a superpage which is registered in the CAM 754. Thus up to 128 superpages can be registered in the CAM. One of the other bits is used to flag any location in the CAM which is unused. The remaining bit is spare.
  • Figure 16 illustrates how the CAM 754 operates.
  • this input value is compared with each of the contents of the CAM. If a match is found and provided the unused flag is not set, the address in the CAM of the match is output, e.g. 1 in the illustration. If no match is found with the contents at any of the 128 addresses of the CAM, then a superpage fault is flagged on line 748S, and the required superpage is then set up in the CAM in the manner described in detail later.
  • the 7-bit superpage identification output from the CAM 754 on line 767 is used as part of an address for the page address table 756, implemented by a 4k word x 16-bit SRAM.
  • the remaining 5 bits of the address for the page table 756 are made up by: bits 7, 8, 23 and 24 of the virtual address which identify the page within a superpage; and an X/Y select signal on line 713.
  • the page table 756 has registered therein the X and Y page addresses in the VRAM 700 of: a) the basic page A in which the pixel to be accessed is located; b) the page B which is to the right of the page A in the virtual address space; c) the page C which is above the page A in the virtual address space; and d) the page D which is to the right of page C and above page B in the virtual address space, and these addresses are output on lines 771A to 771D, respectively.
  • Bits 2 to 6 and 18 to 22 of the virtual address are also supplied to each of the sections 764(0) to 764(15) on lines 772X and 772Y. These denote the patch address (px, py).
  • the X and Y patch addresses together with bits 0,1, 16 and 17 of the virtual address (which indicate the misalignment mx, my of the patch p to be accessed) are also supplied to the near-page-edge table 758, implemented usir g combinatorial logic, which provides a 2-bit output to the sections 764(0) to 764(15) on line 774, with one bit being high only if the patch X address px is 31 and the X misalignment mx is greater than zero and the other bit being high only if the patch Y address py is 31 and the Y misalignment my is greater than zero.
  • the X and Y patch addresses (px, py) are also supplied to the X and Y incrementers 760X, 760Y, and these incrementers supply the incremented values px + 1, mod 32 and py + 1, mod 32, to each of the sections 764(0) to 764(15) on lines 776X, 776Y.
  • the four bits 0,1, 16 and 17 giving the misalignment mx and my are also supplied to the sections 764(0) to 764(15) on lines 770X, 770Y and are also supplied to the surface shifters 742, 744 on line 770.
  • Each section 764(0) to 764(15) comprises: a page selection logic circuit 778; X and Y increment select logic circuits 780X 780Y; X and Y 4:1 4-bit page address multiplexers 782X, 782Y; X and Y 2:1 5-bit patch address multiplexers 784X, 784Y; and a 2:1 9-bit address selection multiplexer 786.
  • the page selection logic circuit 778 implemented using combinatorial logic, provides a 2-bit signal to the page address multiplexers 782X,Y to control which page address A, B, C or D to use.
  • the page selection logic circuit 778 performs this selection by being configured to act as a truth table which corresponds to the table of Figure 12.
  • the circuit 778 receives the 2-bit signal on line 774 from the near-page-edge table 758 and this determines which of the four columns of the table of Figure 12 to use.
  • the circuit 778 also receives the misalignment (mx, my) on lines 770X, 770Y, and this data in combination with which section 764(0) to 764(15) (and thus which bx and by applies) determines which of the four rows in Figure 12 to use.
  • the X and Y page address multiplexers 782X, 782Y therefore supply appropriate page address as four bits to complementary inputs of the X/Y address selection multiplexer 786.
  • the increment selection logic circuits 780X, 780Y which are implemented using combinatorial logic, receive the respective X and Y misalignments mx, my and provide respective 1-bit signals to control the patch address multiplexers 784X, 784Y.
  • the increment selection circuits perform this selection by being configured to act as truth tables which correspond to the upper and lower parts, respectively, of the table of Figure 13. It will be noted that selection depends upon the misalignment mx or my in combination with the bx or by position of the memory bank (and thus which of the sections 764(0) to 764: 5) j s being considered).
  • the X and Y patch address multiplexers 784X, 784Y therefore output the appropriate 5- bit patch addresses px or px + 1 (mod.
  • This latter multiplexer receives as its control signal the X/Y selection signal on line 713 and therefore outputs the 9-bit X or Y address appropriate to the particular section 764(0) to 764(15).
  • the address translator 740 therefore deals with the problems described above of addressing pixels from different aligned patches a, b, c, d in the memory 700 when a patch "p" to be accessed is misaligned, and of addressing pixels from different pages A, B, C, D in the memory 700 when a patch "p" to be accessed extends across the boundary of the basic page A.
  • the read surface shifter 742 comprises a pair of 4 x 4 32-bit barrel shifters, 788X, 788Y.
  • the X barrel shifter 788X has four banks 790X(0) to 790X(3) of multiplexers arranged in one direction, and the outputs of the X barrel shifter 788X are connected to the inputs of the Y barrel shifter 788Y, which has four banks 790Y(0) to 790Y(3) of multiplexers arranged in the orthogonal direction.
  • the X and Y barrel shifters 788X, Y receive the X and Y misalignments mx, my, respectively.
  • One of the banks of muliplexers 790X(0) is shown in greater detail in Figure 18, and comprises four 32-bit 4:1 multiplexers 792(0) to 792(3).
  • the data from bank (0,0) is supplied to inputs 0, 3, 2 and 1, respectively, of the multiplexers 792(0) to 792(3).
  • the data from bank (1,0) is supplied to inputs 1, 0, 3 and 2, respectively, of the multiplexers 792(0) to 792(3).
  • the data from bank (2,0) is supplied to the inputs 2, 1, 0 and 3, respectively, of the multiplexers 792(0) and 792(3).
  • the remaining data from bank (3,0) is supplied to the remaining inputs 3, 2, 1, 0, respectively, of the multiplexers 792(0) to 792(3).
  • the other banks of multiplexers 790X(1) to 790X(3) in the X barrel shifter 788X are similarly connected, and the banks 790Y(0) to 790Y(3) in the Y barrel shifter 788Y are also similarly connected. It will therefore be appreciated that the read surface shifter performs a translation with wrap-around in the -X direction of mx positions and a translation with wrap-around in the -Y direction of my positions as shown in Figure 19.
  • the write surface shifter 744 may be provided by a separate circuit to the read surface shifter.
  • the write surface shifter is configured similarly to the read surface shifter, except that the inputs 1 and 3 to the multiplexers 792 in the barrel shifter banks are transposed. This results in translations of +mx and +my in the X and Y directions, rather than -mx and -my for the read surface shifter.
  • the part of the write surface shifter which operates on the write enable signals WE(0) to WE(15) is identical to the part which operates on the data signals, except that the signals are 1-bit, rather than 32-bit.
  • a single circuit may be employed, with appropriate data routing switches, and in this case translation provided by the surface shifter may be switched between (-mx, -my) and (+mx, +my), in dependence upon whether the memory is being read or written, as described with reference to Figures 46 and 47.
  • a superpage fault is flagged, on line 748S.
  • This superpage fault is used to interrupt the address processor 310, which is programmed to perform a superpage interrupt routine as follows. Firstly, the address processor checks whether the CAM 754 has any space available for a new superpage to be registered. If not, the address processor selects a registered superpage which is to be abandoned in the manner described below and causes the, or each, page of that superpage which is stored in the VRAM 700 to be copied to its appropriate location in the paging memory. The registration of that superpage is then cancelled from the CAM 754. Secondly, the new superpage is registered in the CAM 754 at the, or one of the, available locations.
  • LRU least recently used
  • a 128 x 16-bit LRU table 802 is provided, as illustrated in Figure 20.
  • Each of the 128 addresses represents a respective one of the superpages registered in the CAM 754.
  • the 7-bit superpage identification output from the CAM 754 on line 767 is used to address the LRU table 802 each time the superpage identification changes, as detected by the change detector 804.
  • the change detector 804 also serves to increment a 16-bit counter 806, and the content of the counter 806 is written to the addressed location in the LRU table 802.
  • the LRU table contains an indication of the order in which those superpages were last used.
  • the address processor 310 checks the contents of the LRU table 802 to determine which superpage has the lowest count and in that way decides which superpage to abandon.
  • the page fault generator is shown in Figure 21, and comprises a page fault table 794 constituted by a 2k x 4-bit SRAM, a set of three AND gates 796B, C, D and an OR gate 798.
  • the page fault table 794 is addressed by the 7-bit superpage identity code on line 767, and by the X and Y page addresses on line 768X, Y.
  • the page fault table 794 contains a 4-bit flag in which se bits denote whether the basic addressed page A and the pages B, C and D, respectively, to the right, above, and to the right and above, page A are stored in the VRAM 700.
  • the page B flag is ANDed by gate 796B with the bit of the near-page-edge signal on line 774 denoting whether the patch "p" to be accessed extends across the boundary between pages A and B.
  • the page C flag is ANDed by gate 796C with the bit of the near-page-edge signal on line 774 denoting whether the patch "p" to be accessed extends across the boundary between pages A and C.
  • the page D flag is ANDed by gate 796D with both bits of the near-page-edge signal, which in combination denote whether the patch "p" to be accessed extends in page D above page B and to the right of page C.
  • the outputs of the three AND gate 796B, C, D and the page A flag are then ORed by the OR gate 798, the output of which provides the page fault flag on line 748P.
  • the page fault signal on line 748P is used to interrupt the address processor 310.
  • the address processor searches a table in its PRAM 318 for a spare page address in the VRAM 700, causes the required page to be swapped into the VRAM at the spare page address, and update the table in its PRAM 318.
  • the exchange 326 and the VRAM 700 communicate in patches of sixteen pixels of data, each pixel having 32 bits.
  • the grid processor 312 has sixteen processors, each of which processes pixel data and communicates with the exchange 326.
  • the grid processor 312 and the address processor 310 can communicate address data via the FIFO 332.
  • the exchange 326 includes a crossbar 377, and a logical implementation of the crossbar 377 and of the grid processor 312 is shown in more detail in Figure 22.
  • the crossbar 377 comprises sixteen 16:1 32-bit data multiplexers 602(0) to 602(15); sixteen 16:1 1-bit write enable multiplexers 603(0) to 603(15); a 512-bit bidirectional FIFO 604 for pixel data; and a 16-bit bidirectional FIFO 605 for the write enable signals.
  • the 16 pixels of a 4 x 4 patch are supplied from the VRAM 700 ( Figure 8) via the read surface shifter 742 and via the FIFO 604 as data D(0) to D(15) to the sixteen inputs of each data multiplexer 602(0) to 602(15).
  • the data multiplexers 602(0) to 602(15) supply data D(0) to D(15) via the FIFO 604 and the write surface shifter 744 to the VRAM and the write enable multiplexers 603(0) to 603(15) supply write enable signals WE(0) to WE(15) via the FIFO 605 to the write surface shifter 744.
  • the FIFOs 604, 605 and also the FIFO 332 are employed so that the grid processor 312 does not need to be stalled to take account of different access speeds of the VRAM 700 in dependence upon whether page- mode of non-page-mode access is taking place.
  • Each of the data multiplexers 602(0) to 602 is associated with a respective one of sixteen processors 606(0) to 606(15) and communicates therewith respective data signals D'(0) to D'(15), which are logically 32 bits, but which in practice may be implemented physically as 16 bits, with appropriate multiplexing techniques.
  • the data signals D'(0) to D*(15) are also supplied to respective parts of the bus 331.
  • each of the write enable multiplexers 603(0) to 603(15) i3 associated with a respective one of the sixteen processors 606(0) to 606(15) to 606(15) which supply respective 1-bit write enable signals WE'(0) to WE'(15) to the write enable multiplexers.
  • Each processor 606(0) to 606(15) provides a logical control signal CO(0) to CO(15) to control both its associated data multiplexer 602 and write enable multiplexer 603.
  • any processor may provide any respective one of the data signals by providing the number 0 to 15 of the required data signal as its control signal to its data and write enable multiplexers.
  • any processor may read any of the data signals by providing the number 0 to 15 of the required data signal to its data multiplexer.
  • the crossbar 377 shown in Figure 22 is simplified for reasons of clarity, and shows, for example, bi-directional multiplexers, which in practice are difficult to implement.
  • a modified form of the exchange, incorporating the crossbar and the surface shifters, is shown in Figure 46.
  • the exchange of Figure 46 comprises sixteen sections, of which one typical section 326(i) is shown for simplicity.
  • the data D"(i) from the memory is supplied via a buffer BA(i) and register RA(i) to one input of a 2:1 multiplexer SA(i) acting as a two-way switch.
  • the output of the switch SA(i) is .fed to an input i of the surface shifter 743 which performs surface shifting for read and for write.
  • the corresponding output i of the surface shifter 743 is fed to one input of a multiplexer switch SB(i) and is also fed back to the data D"(i) input via a register RB(i) and a tri-state buffer 3B(i).
  • the output of the switch SB(i) is input to a FIFO(i), the output of which forms the other input of switch SA(i) and is also fed to one input of a further switch SC(i).
  • the set of sixteen data lines D(0) to D(15) connect the exchange sections 326(0) to 326(15) and the output of switch SC(i) is connected to data line D(i).
  • the output of each switch SC(0) to SC(15) is connected to the data line of the same number.
  • the sixteen inputs of a 16:1 multiplexer MUX(i) are connected to the data lines D(0) to D(15), and the output of the multiplexer MUX(i) is connected via a register RC(i) and a tri-state buffer BC(i) to the respective processor PROC(i) via the data line D'(i).
  • the output of the multiplexer MUX(i) is also connected to the other input of switch SB(i).
  • the data line D'(i) from the processor PROC(i) is also connected via a buffer BD(i) and a register RD(i) to the other input of the switch SC(i).
  • the control signal CO(i) for the multiplexer MUX(i) is provided by a switch SD(i) which can select between a hardwired value i or the output of a register RE(i) which receives its input from the output of the register RD(i).
  • control signals CSB, CSC, CSD and CBC are supplied to the multiplexer switches SB(0) to (15), the multiplexer switches SC(0) to (15), the multiplexer switches SD(0) to SD(15), the tri-state buffers BC(0) to (15) from the microcode memory 308 ( Figure 4) of the processing section 321.
  • control signals CSA, C3B and CSS derived from the microcode memory 307 of the address processing section 309 are supplied to the multiplexer switches SA(0) to (15), the tristate buffers BB(0) to (15) and the surface shifter 743.
  • the exchange 326 of Figure 46 is operable in three modes.
  • the processors PROC(0) to PROC(15) can read the memory; in a write mode, they can write to the memory; and in a transfer mode, they can transfer .pixel data between each other.
  • the values of the control signals for these three modes are as follows:
  • control signal CSD can select between a "straight-through” mode in which each multiplexer MUX(i) selects its input i and thus data D(i), or a "processor-selection” mode in which it selects an input j and thus data D(j) in accordance with the value j which the processor has loaded into the register RE(i).
  • FIG 47A The effective configuration of a generalised one of the exchange sections 326(0 of Figure 46 in the read mode is shown in Figure 47A.
  • the data path from the data line D"(i) is via the register RA(i) to the surface shifter 743.
  • the surface shifter applies a shift of (-mx,-my) (mod. 4) to the data paths.
  • the data path continues via the FIFO(i) to the data line D(i).
  • the output data passes ⁇ i the register RC(i) as data D'(i) to the processor PROC(i).
  • the effective configuration of the exchange section 326(i) in the write mode is shown in Figure 47B.
  • the data D'(i) from the respective processor PROC(i) passes via the register RD(i) to the data line D(i).
  • the output data passes via_ the FIFO(i) to the surface shifter 743.
  • the surface shifter applies a shift (+mx,+my) (mod. 4) to the data paths. From the surface shifter, the data path continues ⁇ ia the register RB(0 as data D"(i) to the VRAM 700.
  • the write-enable signal follows the same path WE'(i) to WE(i) to WE"(i) as the data signal path D'(i) to D(i) to D"(i).
  • these paths are logically 33 bits made up from 32 bits for the data signal and 1 bit for the write-enable signal.
  • the effective configuration of the exchange section 326(0 is as shown in Figure 47C.
  • the control signal CSD to the switch SD(i) is set to 1 so that the multiplexer MUX(i) receives as its control signal the value j loaded into the register E(i).
  • the processors output the values j of the data D(j) which they wish to receive as the lowest four bits of their data lines, and these values j are clocked into the registers RD(i).
  • the processors In the second phase , the processors output the data to be transferred out, and this data is clocked into the registers RD(i), while the values j are clocked out of the registers RD(i) and into the registers RE(i), thus setting the multiplexers MUX(i) to receive the data on the respectively selected lines D(j).
  • the data in the registers RD(i) is clocked out onto the lines D and each multiplexer MUX(i) receives and outputs the data on respectively selected line D(j).
  • each processor PROC(i) receives the data (j) from the processor PROC(j) which was selected by the processor PROC i) by its output value j in the first phase.
  • the processors 606(0) are connected to the data broadcast bus 323 and to a priority encoder 614 having 16 sections and which is associated with the sequencer 329.
  • the processors 606(0) to 606(15) communicate address data with the data broadcast bus 323 and the FIFO 332 connects the data broadcast bus 323 with the address processor 310.
  • the processors 606(0) to 606(15) can also supply respective "unsatisfied” signals US(0) to US(15) and respective “X waiting” signals XW(0) to XW(15) to the respective sections of the priority encoder and can receive respective "process enable” s ignals EN(0) to EN(15) from the respective sections of the priority encoder 614.
  • the priority encoder 614 has a sequencer enable (SE) output on line 618 to the sequencer 329 which controls the sequence of processing of a series of microcode instructions by the processors 606.
  • the purpose of the priority encoder 614 is to provide high efficiency in the accessing by the processors 606 of the memory 700.
  • the encoder 614 and processors perform the following process, which is shown in the flow diagram of Figure 23.
  • the left-hand three columns contain steps which are taken by the processors
  • the right-hand column contains steps performed by the priority encoder.
  • steps 620 to 628 there are a series of initialisation steps 620 to 628.
  • steps 620 to 625 those processors which require access to the memory set (1) their respective unsatified signals US and reset (0) their X waiting signals XW, and those processors which do not require access reset (0) their unsatisfied signals US and their X waiting signals XW.
  • steps 626, 628 the priority encoder resets (0) the process enable signals EN for all of the processors and also resets (0) the sequencer enable signal SE.
  • the priority encoder 614 checks through the XW signals, starting with XW(0) in step 630 to find any processor which is X waiting, and if a match is found (step 632) at a processor, designated PROC(q), then the routine proceeds to step 640. If a match is not found, however, in step 632, then the priority encoder checks through the US signals, starting with US(0) in step 634 to find a processor which is unsatisfied, and if a match is found (step 636) for a processor, designated PROC(q), then the routine proceeds to step 640. If a match is not found, however, in step 636, then this indicates that all processors are satisfied, and accordingly the microcode program can proceed. Therefore, the sequence, enable signal SE is set in step 638, and the routine terminates.
  • step 640 the process enable signal EN(q) for the selected processor PROC(q) is set.
  • each processor determines whether it is unsatisfied, and if not exits the subroutine of steps 642 to 654. For any processor which is unsatisfied, then in steps 644, that processor determines whether it is the selected processor, and if so supplies, in step 645, to the data broadcast bus 323 as (xq, yq) the virtual address of the base pixel (0,0) of the patch of pixel data which it wishes to process. This address is supplied via the FIFO 332 to the address processor 310, which in response accesses the appropriate locations in the memory 700, swapping in and out pages of pixel data, if required, as described above.
  • step 656 the priority encoder resets (0) the process enable signal EN(q) for the selected processor.
  • the routine then loops back to step 630.
  • PROC(0) to (3) and (8) to (11) require access to the patches having the base pixel X and Y addresses listed in column 660 of the table, the addresses being in hexadecimal notation.
  • US(0) to (3) and (8) to (11) are set to 1 and the other US signals and the XW signals are reset to 0, as shown in column 662.
  • processors which require access to the same patch can access that patch simultaneously. Furthermore, when a plurality of processors require access to different patches having the same Y address, their accesses are made immediately one after the other, in "page mode". Therefore the address translator does not need to re-latch the Y address(es) in the Y address latches 706(0) to (15) ( Figures 8 and 14) between such accesses. Thus, a considerable improvement in performance is achieved compared with a case where the processors PROC(0) to (15) access their required patches one at a time, sequentially and without reference to any similarity between the addresses to be accessed.
  • up to sixteen pixels in a patch are processed in parallel by sixteen processors.
  • the system is also arranged so that a group of patches, for example, up to 32 patches, are processed in series in order to reduce pipeline start and finish overheads.
  • the method of operation may be modified, as compared with that shown in Figure 23, in order to increase efficiency, by permitting any processor requiring access to, say, a jth pixel in the group to request that pixel without firstly waiting for all the other processors to complete access to their (j-l)th pixels in the group.
  • steps 623 and 630 in Figure 23 for each processor the step "set address of first required pixel in group as (xi, yi)" is included for each processor PROC(i).
  • steps 650 and 652 for each processor as shown in Figure 23 are replaced by the steps shown in Figure 48.
  • the memory is accessed at address xi, yi for the particular processor PROC(i).
  • step 690 it is determined whether or not the new y address yi is equal to the Y address yq of the last ace 3sed pixel. If so, then in step 692, the X waiting flag XW(i) is set to 1, whereas if not, then in step 694, the X waiting flag XW(i) is reset to 0. After steps 692 or 694, the routine proceeds to step 656 as in Figure 23. It will therefore be appreciated that, once any processor has accessed a pixel in its series of required thirty-two pixels, it can immediately make itself ready to access the next pixel in its series, irrespective of how many of their required thirty-two pixels each of the other processors has accessed.
  • the process of Figures 23 and 48 is then modified as follows. In the initialisation steps 622 to 625, the additional steps are included of resetting to zero LP(i) and WM in the register files of all processors, and setting HP to the number of accesses in the series, usually 31.
  • the step 642 in Figure 23 is replaced by "LP(i)OHP?".
  • step 682 in Figure 48 where a processor accesses the memory, it also increments it local pointer to LP(i)+l.
  • the memory is capable of storing pixel data of 32 bits and that the grid processor is capable of processing pixel data logically of at least 32 bits.
  • pixel data having a resolution as great as 32 bits is not needed, and all that may be required is 16-bit or 8-bit pixel data.
  • the data is overlaid so that at no single address for each of the 123 VRAMs 710 does there exist data for more than one page.
  • This is achieved by overlaying the 8- or 16-bit pixel data in units of a pixel, or more preferably units of a patch, as described below.
  • an aligned set of memory cells C(0) to C(127), one from each VRAM chip, and each 4 bits wide, is shown. In the 32-bit arrangement described above, these cells form an aligned patch of 4 pixels.
  • L(0) is provided by C(0) to (3), C(8) to (11), C(16) to (19).... CQ20) to (123).
  • L(l) is provided by the remaining cells C(4) to (7), C(12) to (15), C(20) to (27).... C(124) to (127).
  • layer L(0) is displayed immediately to the left of the layer L(l), as shown in Figure 25.
  • the cells form four layers L(0) to (3) of 16 pixel x 4 pixel patch.
  • the layers are provided by the cells as follows:
  • Layer L(0) C(0), C(l), C(8), C(9) .... C(120), C(121) Layer L(l) : C(2), C(3), C(10), C(ll) .... C(122), C(123) Layer L(2) : C(4), C(5), C(12), C(13) .... C(124), C(125) Layer L(3) : C(6), C(7), C(14), C(15) .... C(126), C 127)
  • the layers are displayed left to right in the order L(0), L(l), L(2), L(3).
  • the X patch address is, however, represented by bits 2-6 for 32-bit mode, by bits 3-7 for 16-bit mode, and by bits 4-8 for 8-bit mode. This makes available bit 2 in the 16-bit mode, and the two bits 2 and 3 in the 8-bit mode, to provide the level data, and leaves only one bit 8 in the 16-bit mode, and no bits in the 8-bit mode, for the X page address.
  • Figures 26-29 The patch and page arrangements and the address notations used for them are represented in Figures 26-29.
  • Figure 26 shows the arrangement of patches in a single 16-bit page
  • Figure 27 shows the arrangement of 8 pages in one complete 16-bit superpage
  • Figure 28 shows the arrangement of patches in a single 8-bit page
  • Figure 29 shows the arrangement of 4 pages in one complete 8-bit superpage.
  • the shift provided by the funnel shifter is controlled by a mode select signal MS on line 814 which is generated by a separate circuit in response to image header information provided prior to an image or graphics processing operation and which indicates whether the pixel data is 32-, 16- or 8-bit.
  • the funnel shifter provides a page X address of up to two bits, a 5-bit patch X address, and the level data L of up to two bits.
  • the relationship between the inputs to and outputs from the funnel shifter 812 is shown in the table of Figure 31, and it will be noted that it corresponds to the required shifting derivable from the table set out above.
  • the next complication arises due to the need to present the 16- or 8-bit pixels to the grid processor during reading such that the appropriate 16 or 8 bits of each pixel will be processed and not the remaining irrelevant 16 or 24 bits.
  • This complication is overcome by supplying, during a read operation, all 32 bits from a location in the memory to the grid processor, together with shift data ZSFT in response to which the grid processor shifts the read pixel data by an amount corresponding to the ZSFT data, and then processes predetermined bits of the shift data, e.g. bits 0-15 for 16-bit processing, or bits 0-7 for 8-bit processing.
  • the patch p also extends into aligned patches c and d at patch addresses (12,17) and (13,17) respectively and at levels 1 and 0, respectively.
  • the determination of the further aligned patch addresses b, c, d is performed by the patch x and y address multiplexers 784 X,Y and the patch y address increment select tables 780 Y described above with reference to Figure 15 and by a modified form of the patch X address increment select table 780X which is responsive to the level data L and the mode select signal MS in addition to the X misalignment mx, as shown in Figure 30B.
  • the modified table 780X provides a 1-bit output to the X patch address multiplexer 784X in accordance with the truth table set out in Figure 30C.
  • the pixels labelled 6, 7, 10, 11, 14, 15, 2 and 3 require a ZSFT of 8 bits
  • the pixels labelled 4, 5, 8, 9, 12, 13, 0 and 1 require a ZSFT of 16 bits.
  • the circuit of Figure 15 includes the addition shown in Figure 34, in addition to being modified as described above with reference to Figures 30 and 31.
  • the level value L and also the bits 0,1 of the vitual address for the misalignment mx are supplied as addresses to four ZSFT tables 818 a to d implemented using combinational logic.
  • the ZSFT tables 818 also receive the mode select signal MS on line 814 and have three sections for 32-, 16- and 8-bit operation which are selected in dependence upon the MS signal.
  • the ZSFT table 818a supplies the ZSFT values ZSFT(O), (4), (8), (12) corresponding to data D(0), (4), (3), (12) supplied from the read surface shifter 742 to the exchange 326;
  • ZSFT table 818b supplies ZSFT (1), (5), (9), (13) for data D(l), (5), (9), (13);
  • ZSFT table 818c supplies ZSFT (2), (6), (10), (14) for data D(2), (6), (10), (14);
  • ZSFT table 818d supplies ZSFT (3), (7), (11), (15) for data D(3), (7), (11) and (15).
  • the table set out in Figure 35A defines the values of ZSFT stored in the ZSFT tables 818a to d for different input misalignments mx, levels L and modes (8-, 16- or 32-bit) and in dependence upon the x value for the particular ZSFT table.
  • the ZSFT values of 0, 1, 2, 3 represent a required shift of 0, 8, 16 and 24 bits respectively.
  • a further complication which arises when dealing with 8 or 16 bit data is that the X near-page-edge signal no longer needs to be dependent solely upon whether or not 4px + mx >124, but is also dependent upon the mode selected and the level data L.
  • the near- page-edge table 758 shown in Figure 15A is modified as shown in Figure 36A so as to receive the mode select signal MS on line 814 and the level signal L, in addition to the patch address (px,py) and the misalignment (mx,my).
  • the modified table 758 of Figure 36A produces X and Y values NPEx and NPEy of the 2-bit NPE signal as shown by the table set out in Figure 36B.
  • the modified arrange ⁇ ment is similar to the arrangement of Figure 15 except in the following respects. Firstly, a 16 x 2-bit ZSFT FIFO 678 is provided to receive ZSFT(U) to (15).
  • the output of the ZSFT FIFO 678 is supplied to each of sixteen 16:1 2-bit multiplexers 680(0) to 680(15).
  • the 2-bit outputs of the ZSFT multiplexers 680(0) to 680(15) are supplied to the respective processors PROC(0) to PROC(15) as signals ZSFT'(O) to (15).
  • the ZSFT muliplexers are controlled by the same logical control signals CO(0) to CO(15) as the associated data and write enable multiplexers.
  • each processor receives the appropriate ZSFT data for the pixel data which is selects and can then shi ft the received pixel data by 0, 8, 16 or 24 bits in dependence upon the value 0, 1, 2 or 3 of the received ZSFT data so that the received pixel data then always occupies the first 8 bits of the processor's input register in 8-bit mode, or the first 16 bits of the input register in 16-bit mode.
  • a further complication which arises when dealing with 16-bit or 8-bit pixel data is that, during writing to the memory 700, only the appropriate 16 or 8 bits should be written, and the remaining 16 or 24 should not be overwritten.
  • the memory cells which are to store the 16-bit pixels labelled 6, 7, 10, 11, 14, 15, 2 and 3 need to have bits 16 to 31 written, with writing of bits 0 to 15 disabled, and the memory cells which are to store the pixels labelled 4, 5, 8, 9, 12, 13, 0 and 1 need to have bits 0 to 15 written, with bits 16 to 31 being disabled.
  • the memory cells which are to store the 8-bit pixels labelled 6, 7, 10, 11, 14, 15, 2 and 3 need to have bits 8 to 15 written, with bits 0 to 7 and 16 to 31 being disabled, and the memory cells which are to store the pixels labelled 4, 5, 8, 9, 12, 13, 0 and 1 need to have bits 16 to 23 written with bits 0 to 15 and 24 to 31 disabled.
  • Each PWE table 822 is provided with the bits 0,1 of the virtual address on bus 319 indicating the X misalignment mx, the value L from the circuit 820 of Figure 30, and the mode select MS signal on line 814.
  • the PWE tables contain the data as set out in Figure 38 and therefore a table having a particular value of bx can provide the 4- bit value PWE in dependence upon the input values of mx, L and MS.
  • the data is processed as the first 8 bits of their 32-bit capacity by the processors, and in 16-bit mode as the first 16 bits. Therefore, in order to ensure that, upon writing, the processors can write to bits 8 to 31 of the memory in 8-bit mode, or bits 16 to 31 of the memory in 16-bit mode, prior to writing, each processor which is to write duplicates, in 8-bit mode, the pixel data of locations 0 to 7 in its output register at bit locations 8 to 15, 16 to 23 and 24 to 31 of the output register, and duplicates, in 16-bit mode, the pixel data of bit locations 16 to 31. Accordingly, when the enabled bits of the pixel data are written to the memory, the complete data for the pixel is written.
  • pixel data in the VRAM is processed in patches and that a non-aligned patch may extend across a page boundary. Therefore the arrangement described below also includes for each page, dirty flags for the pages B, C and D, as shown in Figure 11, to the right, above and the right and above, of the page A in question. It should be noted that if page A has a virtual page address (PX,PY) then pages B, C and D have virtual page addresses (PX + 1, PY), (PX, PY + 1) and (PX + 1, PY + 1), respectively.
  • a dirty-page table 834 is provided by a 2K SRAM which is addressed by the 7-bit superpage identification on line 767 from the CAM 754, and the 2-bit page X address and 2-bit page Y address from the virtual address bus 314 on lines 768X,Y.
  • the eight data bits at each location in the table 834 are assigned as follows:
  • Bits 0 to 2 and 4 to 6 of the dirty page data are supplied to respective OR gates 836 (0) to (2) and 836 (4) to (6).
  • the signals dsB and drB are ORed with the near-page-edge X signal NPEX.
  • dsC and drC are ORed with the near- page-edge Y signal NPEY, and at gates 836 (4) and (0), the signals dsD and drD are ORed with an ANDed form of the near page edge X and Y signals on line 774.
  • the six bits output from the OR gates, together with a pair of high bits, representing the new signals dsA and drA, are then passed via a register 838 for writing back into the dirty page table 834 under control of a dirty pages write-enable signal DWE on line 840.
  • the 8-bit data line of the dirty page table 834 is also multiplexed onto the 48-bit virtual address bus 319, and the address processor is operable (a) to reset the appropriate dirty swap bits and set the appropriate dirty render bits when a new page is swapped from the paging memory to the VRAM, (b) to set the appropriate dirty swap bits and dirty render bits for a page when rendering operation is carried out on that page, (c) to test the appropriate dirty swap bits for a page when that page is to be replaced by a different page in the VRAM, and (d) to test the appropriate dirty render bits for a page when that page is to be copied from the rendering section to the monitoring section of the VRAM and to reset the dirty render bits.
  • page P when page P is copied from the paging memory into the VRAM, it is treated as page A for the purposes of Figure 45A.
  • bit 7 (dsA) of the dirty flag for page A is reset and bit 3 (drA) of the dirty flag for page A is set.
  • the address processor 310 determines whether there is a page B' stored in the physical memory, that is the page to the left of page A.
  • step 846 bit 6 (dsB) and bit 2 (drB) of the dirty flag for page B' are reset and set respectively. Similar steps 848, 850 and 852, 854 are carried out for pages C and D', that is pages below and to the left and below of page A in the paging memory. Then, in step 856, page A is copied from the paging memory of the VRAM. The process of Figure 45A is then repeated for pages Q,R & S. It will therefore be appreciated that the dirty flags for pages P to S attain the state as shown in column 902 of Figure 44.
  • step 862 bit 3 (drA) of the selected page (A) is tested, and if set page A is copied to the monitoring section in step 864, and in step 866 bit 3 (drA) for page A, bit 2 (drB) for page B' to the left of page A, bit 1 (drC) for page C below page A and bit 0 (drD) for page D' to the left and below page A are reset.
  • bit 2 (drB) of page A is tested, and if set page B relative to page A is copied to the monitoring section in step 870, and in step 872 bit 2 (drB) of page A and bit 3 (drA) for page 3 to the right of page A, bit 0 (drD) for page C below page A, and bit 1 (drC) for page E below and to the right of page A are reset.
  • Somewhat similar steps 874 to 884 are performed for bits 1 and 0 (drC, drD), as shown in Figure 45B, and if set the respective page C or D is copied to the monitoring section and various bits are reset as shown. It will therefore be appreciated that when this process is carried out with the dirty flags in the state as shown in column 902 of Figure 44, all four pages P to S are copied to the monitoring section of the VRAM, and the dirty flags attain the states as shown in column 904.
  • the monitoring section of the VRAM is again updated in accordance with the process of Figure 45B.
  • the only dirty render flag bit set is drA for page Q, and therefore only page Q is copied, and the bit drA for page Q is reset, as shown in column 908.
  • page P is modified, and also a misaligned patch in page P modifies page Q.
  • bits 7, 6, 3 and 2 (dsA, dsB, drA, drB) of the page P dirty flag are set, as shown in column 910. Because bits drA and drB for page P are set, pages P and Q are copied to the monitoring section by the process of Figure 45B, and bits 3 and 2 (drA, drB) for page P are then reset, as shown in column 912.
  • bits 7 and 3 (dsA, drA) of the page S flag are set; bit 3 (drA) of the page Q flag is set, and bit 7 (dsA) of the page Q flag remains set, as shown in column 914. Because bits 3 (drA) of pages Q and S are set, pages Q and S are copied to the monitoring section of the VRAM, and these bits are then reset, as shown in column 916.
  • step 886 a copy flag is reset.
  • step 888 it is determined whether bit 7 (dsA) for page A is set, and if so in step 889 that bit is reset and the copy flag is set.
  • steps 888 and 889 are then repeated as steps 890 to 895 for bits 6, 5 and 4 (dsB, dsC, dsD) respectively of the dirty page flags for pages B 1 , C and D' relative to page A.
  • steps 896 and 897 if the copy flag has been set, page A is copied to the paging memory.
  • Page R is not copied because none of dsA for page R (step 888), dsB for page Q (step 890), and dsC and dsD for the pages below, and below and to the left, of page R (step 892 and 894) are set.
  • Page S is copied because dsA is set for page S (step 888). This bit is then reset (step 889). Accordingly, pages P, R and S are copied back to paging memory, and the flags attain the status shown in column 918 of Figure 44.
  • the processors 606(0) to (15) of the grid processor 312 described above are arranged basically as a SIMD array, SIMD standing for 'Single Instruction - Multiple Data' and meaning that all of the processors receive the same instruction and apply it to their own particular data elements. This can be an efficient and simple way of obtaining good performance from a parallel-processing machine, but it does assume that all of the data elements need exactly the same instruction sequence.
  • the processors are preferably arranged, as described below, to be able to deal with conditional instructions. Further detail of such an arrangement is shown in Figure 49.
  • Figure 49 shows three of the processors PROC 0, PROC i and PROC 15, with PROC i being shown in greater detail, their PRAMs 322(0), (i), (15), the microcode memory 308 and the processing section broadcast bus 323.
  • the microcode memory 308 supplies microcode instructions of about 90 bits to each respective instruction decode logic (IDL) circuit 100 in each of the processors. The same microcode instruction is supplied to each processor.
  • IDL instruction decode logic
  • the instruction decode logic is provided by a gate array which decodes the 90 bit instruction to provide about 140 control bits to various elements in the respective processor including an arithmetic logic unit ALU 102, a 32-bit pixel accumulator (pa) 104, a 1-bit condition accumulator (ca) 106 and a status select circuit 108 which is provided by a gate array.
  • the ALU 102 connects with the data bus D' via the exchange 326 to the VRAM 700, the pa 104 and a stack of pixel registers pO to pn in the PRAM 322.
  • the main data paths for pixel data are from the data bus D' to the ALU 102 and the pa 104; from the pa 104 to the ALU 102, the data bus D' and selected pixel registers pO to pn; from the ALU 102 to the data bus D' and the pa 104; and from selected pixel registers pO to pn to the ALU 102.
  • Various status bits are output from the ALU 102 to the status select circuit 108, such as a "negative" bit, a "zero" bit and an "overflow” bit. Some of these status bits are also fed out externally. Also, external status bits such as the EN flag (see Figures 22, 23) are fed in to the status select circuit 108.
  • the status select circuit 108 can select a respective status bit and output it to the ca 106.
  • the ca 106 is associated with a stack of condition registers cO to en in the PRAM 322.
  • the ca 106 also connects to the IDL 100 and provides the write enable output WE' of the processor.
  • the main paths for condition and status bits are: from the ALU 102 to the status select circuit 108 and to the external outputs; from the external inputs to the status select circuit 108; from the status select circuit 108 to the ca 106; from the ca 106 to the condition stack registers cO to en, the write enable output WE' and the ALU 102; and from the condition stack registers cO to en to the ca 106.
  • the 1-bit input from the ca 106 to the IDL 100 is important.
  • This input condition bit enables the IDL 100 to modify the control outputs from the IDL 100 in dependence upon the value of the condition bit, and accordingly the arrangement provides direct support for microcode instructions from the microcode memory 308 to the IDL 100 which in high-level language would be represented by, for example, if (condition) then (operation X) else (operation Y).
  • operation X operation X
  • operation Y operation Y
  • the VRAM 700 contains three images: image A of Figure 50A which in this simple example is a rectangle of horizontal lines; image B of Figure 50B which is a rectangle of vertical lines; and image C of Figure 50C which is a mask in which the upper-left and lower-right corners are black (say pixel values of 0) and the remainder is white (say pixel values of (2*2_ ⁇ ).
  • image A and B using image C as a mask to form an output image D such that image A appears where the mask image C is black and image B appears where the mask image C is white.
  • the process performed by the processors under control of the microcode instructions from the microcode memory 308 to perform this operation can be considered, using high-level pseudo-language, to be as follows:
  • steps 1 and 13 set up a loop for each patch (x,y) having its origin in the rectangle.
  • each processor PROC 0 to PROC 15 will process a different pixel in the patch.
  • step 2 a test is made to determine whether the particular processor's pixel in the patch is in the rectangle, and if so the ca 106 is set, otherwise it is reset. This value of ca will form the write-enable signal WE'.
  • step 3 this value which is stored in the ca 106 is put onto the condition stack in cO and an associated condition stack pointer is modified accordingly.
  • step 4 the value of the processor's pixel in the current selected patch in image A is loaded into the pa 104, and in step 5 is transferred to the pO register.
  • step 6 the value of the processor's pixel in the current selected patch in image 3 is loaded into the pa 104, and in step 7 is transferred to the pi register.
  • step 8 the value of the processor's pixel in the current selected patch in the mask image C is loaded into the pa 104, and then in step 9 the zero status bit of the ALU 102 is selected by the status select circuit 108 and is loaded into the ca 106.
  • the ca 106 value becomes 1, and if it is white, the ca 106 value becomes 0.
  • step 11 the signal which was put onto the condition stack at cO in step 3 is pulled off the stack and placed in the ca 106 in order to constitute the write enable signal WE' and the condition stack pointer is modified accordingly.
  • step 12 the pixel value in the pa 104 is transferred out to the image D at the appropriate pixel position for the processor in the current selected patch.
  • condition stack cO to en was used simply to store the initially generated value which will form the write enable signal, and only one register in the stack was employed. By virtue of the provision of more than one register in the condition stack, nesting of the conditional instructions is permitted.
  • pages of data can be swapped between the VRAM 700, on the one hand, and the paging memory comprising the DRAM 304 ( Figure 4), and the paging RAM 504 and fast disk 510 ( Figure 5), on the other hand.
  • the paging memory comprising the DRAM 304 ( Figure 4), and the paging RAM 504 and fast disk 510 ( Figure 5), on the other hand.
  • the total system is based on a distributing operating system denoted by the triangle 200.
  • Part of this system constitutes a host page manager module 202 running on the processor 10 of the host computer.
  • Another part constitutes a front-end page manager module 204 running on the i960 control processor 508 of the front-end board 22 and handling the paging RAM 504 and fast disk 510.
  • a further part constitutes a renderer page manager module 206 running on the i960 control processor 314 of the renderer board 16 and handling the VRAM 700 and the DRAM 304.
  • Each of these page manager modules 202, 204, 206 can make a request R to any other module for a page P of image data specified by the virtual page address (VPA) consisting of the following bits of the virtual address:
  • VPN virtual page address
  • the module to which a request R is made determines whether it is responsible for the requested page, and if so it transfers the page of data P and responsibility therefor to the requesting module, but if not it indicates to the requesting module that it is not responsible for the requested page.
  • the renderer page manager module 206 checks with itself whether the required page is stored in the renderer DRAM 304, and if so swaps the page of data into the VRAM 700. If not, the module 206 checks with the front-end page manager module 204 whether it is responsible for the page, and, if so, the page of data is swapped from the RAM 506 or disk 510, as appropriate, into the VRAM 700.
  • the renderer module 206 asks the host module 202 for the page of data, which is then swapped into the VRAM 700.
  • the front-end module 204 For each page in the image the module 204 firstly checks with itself whether it is responsible for that page. If it is and the page is already stored on the disk 510, it stays there, and if the page in question is stored in the front-end RAM 506 the data of that page is copied to the disk 510.
  • the module 204 If the module 204 is not responsible, it checks with the renderer module 206 whether the renderer module has responsibility for the page, and, if so, the page of data is copied from the VRAM 700 or DRAM 304 of the renderer to the disk 510. If not, the front-end module 204 requests the page in question from the host module 202, and the page of data is transferred to the disk 510.
  • the front-end module 204 and the renderer module 206 each maintain a table 208, 210 containing a list of the virtual page addresses of the pages, and against each address an indication of the location of that page.
  • the location data in the front-end table 208 would comprise an indication of whether the page is in the RAM 506 or on the disk 510. If in the RAM 506, the physical address of that page in the RAM would be included, and if on the disk 510, an indication of the location on the disk would be included.
  • the location data for each virtual page address in the renderer table 210 may contain an indication of whether the page is in the DRAM 304 or the VRAM 700 and the physical address of the page in the respective memory.
  • the physical address of the page need not necessarily be kept in the table 210, because this address can be determined by the module 206 from the CAM 754 and the page table 756 ( Figure 15A) of the address translator 740, and indeed it i not necessary for the table 210 to include the virtual page address of the pages in the VRAM 700, because the module can check whether a page is present by referring to the CAM 754 and page table 756 and testing whether or not a page fault is generated.
  • An important feature of the filing system is that the host page manager module 202 is not responsible for the storage of whole pages of data. The host module 202 is used when an image is initially created.
  • the image is specified by the host processor 10 as being of a particular dimension, size, bit width (see Figures 25 to 41) and background colour.
  • the system software 200 allocates to that image the next available image ID (bits 32 to 47 of the virtual address).
  • the colour of every pixel in the new image is the background colour, and the host module 202 therefore merely sets up a table 212 containing the virtual page address of the or each page required in the new image, and against the or each page address the table 212 contains the 32-bit background colour of the image.
  • the filing system described above may be modified so that it works in conjunction with the dirty page-swap scheme, by including against each virtual page address in each table 208, 210, 212 a bit indicating whether that page is current.
  • the operation of each module 202, 204, 206 is then modified so that when a module has responsibility for a page, the current bit is set to 1 and when responsibility is transferred to a different module the current bit is reset to zero.
  • the renderer module 206 polls the other modules 202, 204 to check which has an entry in its table for the page with the current bit reset, and instructs that module to set the current bit, obviating the need to copy all of the data-elements for that page from the renderer module to the other module.
  • a single word representing the image background colour is stored for each new image.
  • a few words may be stored, for example as a patch, and representing, for example, a pattern which is to be repeated in the new image.
  • the non-split-level patches, pages and superpages described above are two-dimensional and have a pixel resolution of 32-bits, a patch size of 4 pixels x 4 pixels, a page size of 32 patches x 32 patches, and a superpage size of 4 pages x 4 pages. It will be appreciated that the system may be configured so as to operate for example with one- or three-dimensional patches, and/or pages and/or superpages, with patches, pages and superpages of different sizes, and with different pixel resolutions.
  • the system may be arranged to operate selectably in different configurations through appropriate use of funnel shifters, switches and the like.
  • examples of specific sizes of the memories have been given, but it will be appreciated that other sizes may be used.
  • division into two and - four in the X direction has been illustrated, but it will be appreciated that other divisors may alternatively or selectably be employed, that division in other directions may alternj lvely or selectably be employed, and that division on a pixel basis rather than a patch basis may alternatively or selectably be employed.
  • the dirty page facility described above deals with copying between the rendering section and monitoring section of the VRAM and also with swapping between the VRAM and the paging memory, but it will be appreciated that either of these two features may be employed without the other.
  • the page manager modules are run on specific processors, but it will be appreciated that each page manager module may be run on different processors, and that the modules may be combined.
  • PCT/GB90/ PCT/GB90/ , PCT/GB90 , PCT/GB90/ ,

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
EP19900911629 1990-08-03 1990-08-03 Datenfeld parallelverarbeitungssystem Withdrawn EP0550427A1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB1990/001209 WO1992002882A1 (en) 1990-08-03 1990-08-03 Data-array parallel-processing system

Publications (1)

Publication Number Publication Date
EP0550427A1 true EP0550427A1 (de) 1993-07-14

Family

ID=10669543

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19900911629 Withdrawn EP0550427A1 (de) 1990-08-03 1990-08-03 Datenfeld parallelverarbeitungssystem

Country Status (2)

Country Link
EP (1) EP0550427A1 (de)
WO (1) WO1992002882A1 (de)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4980817A (en) * 1987-08-31 1990-12-25 Digital Equipment Vector register system for executing plural read/write commands concurrently and independently routing data to plural read/write ports
US5029018A (en) * 1987-11-18 1991-07-02 Nissan Motor Company, Limited Structure of image processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9202882A1 *

Also Published As

Publication number Publication date
WO1992002882A1 (en) 1992-02-20

Similar Documents

Publication Publication Date Title
US5611064A (en) Virtual memory system
US5539898A (en) Data-array processing system wherein parallel processors access to the memory system is optimized
KR100319770B1 (ko) 영상화 및 그래픽 처리 시스템내에서의 다차원 주소발생방법
US5590301A (en) Address transformation in a cluster computer system
US4615006A (en) Physical address developing unit
EP0408810B1 (de) Mehrprozessor-Computersystem
EP0638868A2 (de) Indirektes Adressierungsschema mit variabler Genauigkeit für SIMD Multiprozessoren und Anlage, sowie praktische Durchführung davon
US5519829A (en) Data-array processing and memory systems
GB2251770A (en) Graphics accelerator system using parallel processed pixel patches
KR20010031192A (ko) 기계시각시스템에서의 영상데이터와 같은 논리적으로인접한 데이터샘플들을 위한 데이터처리시스템
US5546532A (en) Data-array processing system
US5293622A (en) Computer system with input/output cache
US8478946B2 (en) Method and system for local data sharing
US5602986A (en) Data processing and memory systems with retained background color information
GB2251773A (en) Graphics accelerator using parallel processed pixel patch to render line
WO1992002883A1 (en) Parallel-processing systems
WO1992002923A1 (en) Data processing and memory systems
US5708839A (en) Method and apparatus for providing bus protocol simulation
JPH0282330A (ja) ムーブアウト・システム
EP0532690B1 (de) Verfahren und gerät zur verwaltung von speicherzugriffen auf seite null in einem mehrprozessorsystem
WO1992002922A1 (en) Data-array processing and memory systems
WO1992002924A1 (en) Data-array processing system
EP0550427A1 (de) Datenfeld parallelverarbeitungssystem
JPS58149556A (ja) 並列処理装置
GB2251769A (en) Graphics accelerator rendering outlines using parallel processing

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19930308

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE GB

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19940301