EP0201797A2 - Rechnersystem mit hoher Leistung - Google Patents

Rechnersystem mit hoher Leistung Download PDF

Info

Publication number
EP0201797A2
EP0201797A2 EP86105879A EP86105879A EP0201797A2 EP 0201797 A2 EP0201797 A2 EP 0201797A2 EP 86105879 A EP86105879 A EP 86105879A EP 86105879 A EP86105879 A EP 86105879A EP 0201797 A2 EP0201797 A2 EP 0201797A2
Authority
EP
European Patent Office
Prior art keywords
processor
array
memory
nodes
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP86105879A
Other languages
English (en)
French (fr)
Other versions
EP0201797A3 (en
EP0201797B1 (de
Inventor
Stephen Richard Colley
David Walter Jurasek
John Franklin Palmer
William Stanley Richardson
Doran Kenneth Wilde
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NCube Corp
Original Assignee
NCube Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NCube Corp filed Critical NCube Corp
Publication of EP0201797A2 publication Critical patent/EP0201797A2/de
Publication of EP0201797A3 publication Critical patent/EP0201797A3/en
Application granted granted Critical
Publication of EP0201797B1 publication Critical patent/EP0201797B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks
    • G06F15/17343Direct connection machines, e.g. completely connected computers, point to point communication networks wherein the interconnection is dynamically configurable, e.g. having loosely coupled nearest neighbor architecture

Definitions

  • the invention relates to data-processing systems, and more particularly, to a high-performance, parallel-processing system.
  • Illiac IV designed by Burroughs Corporation.
  • the Illiac IV utilizes an array of 64 processors, each with a local memory, operating in parallel with each processor executing the same instruction.
  • SIMD single-instruction- stream multiple data stream
  • the Illiac IV is a very powerful computer and has been used to solve difficult scientific problems such as those involving fluid flow.
  • the Illiac IV system is difficult to program because of the SIMD architecture.
  • a more important disadvantage, however, is that the system lacks reliability. The mean time to failure is measured in hours.
  • the high-performance data-processing systems that have been successful fall within one of two categories.
  • the first category are those that are very high-speed uniprocessors that are heavily pipelined.
  • the second category are special- purpose, inexpensive array processors that off-load data from a general processor for scientific processing.
  • the Cray 1 and Cyber 205 fall within the first category, and the Floating-Point Systems' AP-120 falls within the second category.
  • the Cray 1 system approaches the limits imposed by physical constants. Wires must be kept short and the processor must be tuned to get full performance. Programs must be vectorized to take advantage of the pipeline structure. If this is not done, the Cray 1 will run much slower than its maximum speed. Finally, because of its size and sensitivity, the Cray 1 requires expensive, special handling such as reinforced floors, liquid cooling, and hand tuning.
  • the second category of prior systems is also subject to the physical limits imposed by the speed of the single or small number of processors that make up the array.
  • MIMD multiple-instruction stream,multiple-data-stream
  • a data-processing architecture and implementation thereof is provided in which an array of processors having local memory are interconnected in a hypercube topology.
  • Each of the processors includes means for executing instructions, logic means for interfacing the processor with its local memory, and means for communicating with other processors in the array.
  • the component count is reduced to a minimum, thus reducing size and increasing the reliability of the system.
  • a unique advantage flows from utilizing the hypercube topology, which consists of interconnecting all corners of an N-dimensional cube. Since the number of interconnections per processor grows as Log 2 (N), the distance across an array only grows as Log 2 (N). The number of interconnections in a prior art array grows as SQRT (N). Thus, random communication between nodes is much faster on a hypercube than on any array using a fixed number of interconnections per node.
  • the hypercube is a recursive array since an (N+1)-dimensional array is constructed by interconnecting two N-dimensional arrays. This makes it easy to partition a large array into smaller arrays for multitasking. Furthermore, since a small array is logically equivalent to a larger one, software can be written to run on any size array. This allows for very effective multitasking without programming overhead for task switching. This also results in highly reliable systems that gracefully degrade in the event of processor failures.
  • the I/O bind is alleviated by including an extra serial channel on each processor to handle system I/O. This structure results in a very high I/O bandwidth and improved fault-tolerance. Since each node is accessible through the I/O system, even if several processors fail the remaining processors can be logically reconfigured into an operational array.
  • NcubeTM products referred to in this specification can be obtained by writing to Ncube Corporation, 1815 NW 169th Place, Suite 2030, Beaverton, OR 97006.
  • the architecture of the system in which the present invention is embodied uses up to 1024 identical high speed processors (processing nodes) connected in an array to work on a single problem.
  • Each node is a general purpose processor with 128K bytes of local ECC memory.
  • the array is interconnected in a recursive topology called a hypercube (see Section 3.2) that can be divided into subarrays of 64, 128, 256, or 512 processors.
  • the software can easily adjust to the number of processors in the system.
  • One job can run on the entire array or several jobs can run at once on subsets of the array. This space sharing avoids the constant swapping overhead that occurs in conventional time sharing systems.
  • the modularity of this design enhances extensibility. Simply by rewiring the backplane, the architecture can support a larger array. Also by connecting systems together with an Interprocessor Link it is possible to use multiple. systems in parallel or as a very powerful pipeline.
  • FIGURE 1 One embodiment of a system in which the principles of the present invention are practiced is described in detail in Part II of this specification. Briefly, it consists of an air cooled enclosure containing a backplane with up to 24 boards, a disk/tape subsystem and power supplies.
  • the backplane (16) shown in FIGURE 1 uses 16 slots to support a processor array of up to 1024 processors, each with 128K bytes of local memory. The other B slots are used for I/O. Each I/O slot is connected to a different subset of 128 processors in the array.
  • Each processor node in the processing array has a 1/2 Megaflop (floating point operations) or 2 MIP (integer operations) processor. Thus a fully loaded system with 1024 processors has a potential performance of 500 Megaflopc or 2000 MIPS.
  • Each I/O bus into the array consists of two unidirectional data paths (one inbound, one outbound) that operate independently and in parallel. Each path can transfer data at up to 140 Megabytes/sec.
  • the processing array consists of processors with local memory interconnected in a topology called a hypercube.
  • hypercubes One way to describe a hypercube is graphically. Hypercubes of low order can be illustrated as shown below (circles are nodes and lines are communication links):
  • hypercube Another way to describe the hypercube is by a recursive definition. Each processor has a label that is a binary number. Two processors are connected if their labels differ in only one place. The low order hypercubes are listed below:
  • At least one System Control board (14), shown in FIGURE 1, must be installed in every system and there can be up to 8 per system. Its primary purpose is to run the operating system described in section 5.3, including the cross compilers, assembler and linker and to control a wide range of peripherals. It uses an Intel 80286 with 4 Megabyte of memory for the operating software. There are also four sockets for an EPROM that contains the monitor and diagnostics software described in section 5.2.
  • the System Control board has eight RS-232 serial ports and one high speed printer port. It has the logic to control up to 4 SMD disk drives and three Intel iSBX multimodule connectors. These multimodule connectors support options such as tape drive controllers, ethernet interfaces and interboard buses (for I/O boards). This bus allows for users attached to different System Control boards to access a common file system that is spread across several disk drives controlled by different controllers.
  • FIGURE 12 A block diagram of the System Control board is shown in FIGURE 12, and is described in detail in section 8.9.
  • the central processor of the System Control is an Intel 80286.
  • Intel 80286 This is a high performance 16 bit processor that has a compatibility mode with the Intel 8086. In addition it can address 16 megabytes of memory and has memory management and protection built into the chip. It is the host for a multitasking/multiuser operating system.
  • Intel 80286 see Intel's iAPX 286 Programmer's Reference Manual, which can be obtained from Intel Corporation, 3065 Bowers Ave., Santa Clara, CA 95051.
  • the System Control has 8 sockets for PROM (72) which may be loaded with devices from Intel 2764's up to Intel 27512's.
  • the PROM resides at location F80000 to FFFFFF in the Intel 80286 memory space. Since these are byte wide devices they are arranged in 2 banks with the following addresses.
  • PROM's are programmed by and contain a monitor described in section 5.2, that includes the following functions:
  • Each node consists of a processor and 128 Kbytes of RAM. This memory is triple ported since it can be directly accessed by the local processor, the Intel 80286 and the SMD disk controller. All Intel 80286 and disk controller accesses to these local memories are 16 bits only.
  • the main purpose of this array (214) is to provide communication (an I/O bus) with the main Processing Array. Thus, 8 of the 11 channels on each node are dedicated to providing communication between the Processing Array and the System Control.
  • the nodes on the System Control board are numbered 0,1,...,15 and their local memory resides in the Intel 80286 address space according to the table below.
  • the local processing nodes on the System Control each have 3 communication channels uncommitted to I/O, they are interconnected in two order-3 hypercubes. That is, nodes numbered 0,1,,...,7 form one hypercube and nodes numbered 8,9,...,15 form another hypercube. This allows users to test their programs on the small hypercubes on the system control board before loading them into the main array, thus offloading most debugging tasks. It may appear that since the two hypercubes are not directly interconnected it would be difficult to move data from a node in one cube to a node in the other. However, since all the memory is in the Intel 80286 memory space, it is simple and fast to use the central processor or the DMA processor to move the data.
  • the devices that are inserted into the SBX connectors appear to a programmer as locations in the 80286 I/O space that can be read from and written into. They can also be controlled by the Intel 82258 ADMA (Advanced Direct Memory Access) chip.
  • the boards can generate interrupts to the Intel 80286 (for details on interrupts see section 3.3.2.1.10).
  • the data that is read or written can be either Byte or Halfword and each SBX connector has 32 reserved I/O addresses (3 bits of address and 2 bits of chip select) as shown below.
  • the System Control Board has an Intel 82258 ADMA device (80) that controls the Direct Memory Access in the system. It is specifically intended to control the 3 SBX connectors and the Centronics parallel port. However, it is a very powerful processor and can control other DMA functions in the system including moving blocks of data in memory. Refer to Intel's 82258 Manual for details.
  • the Intel 82258 has 4 DMA channels and each channel has a set of registers associated with it. Also there is a set of global registers. All of these registers are in the Intel 80286 I/O address space as shown in the table below.
  • the Channel Registers can be written by setting the Command Pointer to point to a command block in memory and then giving the Intel 82258 a START CHANNEL command from the Intel 80286.
  • the format of the command block in memory is shown below: 3.2.1.8 Serial Channels
  • the System Control Board shown in FIGURE 12 has 8 serial channels that are controlled by four Intel 8530 Serial Communications Controllers (82), each device handling 2 channels. Each 8530 also has two baud rate generators and interrupt generation logic. There are a set of control (Write) registers and a set of status (Read) registers for each channel. The registers are all 8 bits and all accesses are byte only. A summary of the register functions is listed below (unless noted two copies of each register exist, one for each channel):
  • the only registers that can be directly addressed in the I/O space of the Intel 80286 are the two Data registers (RR8 and WRB) and RRO and WRO. In order to read or write any other register requires two steps: (1) write the appropriate code into WRO then (2) perform the read or write operation.
  • bits 0 through 4 of WRO are automatically cleared so that WRO points to WRO or RRO again.
  • the I/O addresses corresponding to the 8530 devices and the I/O channels are listed in the table below. Each channel can generate four interrupts to indicate the conditions: Tx empty, Status Change, Rx Ready and Special Rx.
  • the interrupt vector addresses (assuming the given vector bases are used) are also listed below. (To obtain the interrupt numbers divide the vector address by four.)
  • Each Serial Channel Controller (82) has an integrated baud rate generator that depends on the setting of a Time Constant, supplied by real-time clock (84).
  • the equation and a baud rate table are given below.
  • the System Control has an SMD disk controller unit (216). It is capable of controlling up to four standard SMD disks drives. Data is accessed in blocks of 1024 bytes at a rate of 1 Megabyte per second.
  • the disk controller unit consists of 6 registers in the I/O address space which are described below.
  • I/O addresses In addition to the registers listed above there are two more I/O addresses and an interrupt associated with the disk controller. Writing to the I/O addresses causes the disk controller to be enabled or disabled. Reading from these addresses yields a system ID (see 3.3.2.1.10). The interrupt indicates that a disk access has completed.
  • the addresses and interrupt number are given below.
  • I/O addresses there is a set of I/O addresses associated with various aspects of system control.
  • the controls include power supply monitoring and sequencing, error sensing and signaling, and board resets. All of the addresses and controls are described below:
  • the system contains sixteen temperature sensing devices located throughout the enclosure. They are used to prevent system damage caused by over heating. In order to trigger a sensor and take a measurement, software must perform the following steps: 3.2.1.12 Real Time Clock
  • the System Control board has a real time clock (84) that is controlled and sensed by writing and reading the following I/O addresses.
  • the System Control has two timers in addition to the Real Time Clock. They are provided by an Intel 8254 that has 4 I/0 addresses associated with it. The two timers are called the Watchdog timer and the Schedule timer. They both use the same prescaler but the Watchdog generates a Non Maskable interrupt (NMI) while the Schedule timer generates interrupt 32. Their addresses are listed below.
  • the Timer is set up by writing to the Control register. The time base is also given below.
  • the System Control provides a full range of interrupts for various system control functions. These interrupts are handled by five Intel 8259A Interrupt Controllers. One of these devices is designated the Master Controller and it has two Slave Controllers (Slave 0 and Slave 1) connected to it. The last two 8259As are used to signal error conditions in the main array and are connected to Slave 0. The Main Array Error Controllers must be used in polled mode. The following table lists the controllers, their addresses, and defines the interrupts they handle. Section 3.3.2.1.14 lists the interrupts and vector addresses that are generated by these controllers. Programming details for the 8259A can be found in Intel's Data Catalogue.
  • the System Control generates and handles a complete set of interrupts for managing the system.
  • the interrupts are defined in the table below.
  • the System Control Board supports the Intel 80287 Math Coprocessor (90) as an option.
  • the I/O addresses listed below are activated by invoking the Intel 80286 Escape opcodes used by the Intel 80287.
  • the details on the 80287 are in Intel's Microprocessor Manual.
  • the System Control Board is initialized on system reset. 3.2.1.18 System Summary
  • a Graphics Processor is used to control a raster scan CRT display. This provides a very effective way for displaying and dealing with the very large amount of data that can be computed and output by the system.
  • the graphics system consists of up to 2 megabytes of RAM organized as a 768 by 1024 by 8 bit frame buffer, 16 processing nodes to handle local display processing, a color lookup table, and the logic to handle display refresh and panning.
  • the output of the graphics system is standard RS-343 RGB video data that can be connected to any high performance (40 mHz) color CRT monitor.
  • the I/O channel bandwidth allows the main processor to output a new frame of display data in excess of 60 times a second (faster than the display refresh rate). This makes the system ideal for a wide range of graphics applications.
  • Two or more systems are interconnected through an I/O channel (an order 7 subcube) in each system.
  • I/O channel an order 7 subcube
  • the processor array is made up of 2 N nodes where N is 6,7,8,9 or 10.
  • Each processing node (FIGURE 4) consists of a general purpose 32 bit processor (including 32 and 64 bit floating point instructions), 128K bytes of ECC memory and 11 communication channels to support the hypercube interconnection scheme and the 8 system I/O channels.
  • the processor recognizes two main classes of data: integers and reals. Integers are represented in standard 2's complement form and come in three types: byte (B-8 bits), halfword (H-16 bits) and word (W-32 bits). There two types of reals.
  • the 32 bit format, called real (R) has an 8 bit exponent and 24 bits of significance.
  • the longreal (L) format is 64 bits with 11 in the exponent and 53 in the significand.
  • the longreal format is used for computations that need high accuracy and for intermediate computations with real variables when the computation is particularly sensitive to roundoff error. Both of these formats conform to the IEEE Binary Floating Point Standard (P754).
  • the processor recognizes and manipulates addresses. Addresses are simply 32 bit unsigned values that point to individual bytes in a linear address space.
  • the processor's instructions operate on data in main memory (as described above) or on data in 32 bit registers.
  • the processor contains three types of registers: the general registers, the processor registers and the communication control registers.
  • the 16 general registers are 32 bits long and are used for both operands and addresses. Since they are general they can be used interchangeably in all operations and addressing modes.
  • the processor registers are special purpose and can only be read from or written into by Load Processor Register (LDPR) and Store Processor Register (STPR) instructions respectively. The exact formats and detailed descriptions of these registers are given in section 4.4.3.
  • the processor registers are shown in FIGURES 7, 9A and 9B and include:
  • Processor registers 6 through 12 are used to signal "ready” and error conditions for the I/O channels.
  • the I/O ports on the processor are unidirectional direct Memory Access (DMA) channels and each channel has two 32 bit write only registers: an address register for the buffer location and a count register indicating the number of bytes left to send or receive. Communication is performed by setting the registers of the desired channel to the appropriate address and data length and then the DMA channel takes over and communicates 4 a message without processor intervention. Interrupts can be used to signal when a channel is available (i.e. when the count reaches zero the channel is "ready"). A separate interrupt vector is provided to indicate to a receiver that an error occurred during the data transmission.
  • DMA direct Memory Access
  • PS program status
  • PC program counter
  • An instruction consists of an operation code followed by zero and one or two data references:
  • All instruction operation codes (opcodes) in the processor are one byte long.
  • the first four bits indicate the operation and number of operands (e.g. ADD: 2 operands, BRANCH: 1 operand) while the other four bits denote the operand size and type (e.g. t Halfword (integer), Real (floating point). This symmetry makes an opcode map easy to read and code generation easier for a compiler.
  • All of the standard instructions are available for each data type including arithmetic, logical, conversion, comparison, branch, call and trap instructions. Instructions can be preceded by a REPEAT prefix that causes them to be executed repeatedly until a termination condition is satisfied. This is a very powerful facility for vector and string operations. Repeats can also be used with both branches and calls in order to execute a block of code repeatedly. (i.e. a REPEAT BRANCH is equivalent to a loop instruction). And for future extension each operand type has a reserved "escape" code.
  • a few instructions have no operands (e.g. BREAKPOINT) and some have only one (e.g. CALL) but most have two address fields. All address fields begin with a one byte mode selector. For all modes involving the general registers the first four bits indicate the mode and the remaining four determine which register to use. If there is an offset indicated it follows the mode selector. Some of the modes provided are literal, immediate, direct and indirect with no registers involved; and register direct, register indirect with and without offset, autoincrement and autodecrement and offset addressing with both the program counter (PC) and the stack pointer (SP). As with instructions there is a reserved "escape" code defined for the mode selector field.
  • PC program counter
  • SP stack pointer
  • the processor recognizes two classes of data: integers and reals (floating point numbers). There are three types of integers and two types of reals.
  • the three integer data types are all represented in standard 2's complement. They are called Byte (B), Halfword (H) and Word (W) and are 8, 16 and 32 bits long respectively.
  • B Byte
  • H Halfword
  • W Word
  • the ranges for the three integer formats are specified as follows:
  • the floating point implementation in the processor conforms to the IEEE Binary Floating Point Standard (P754).
  • P754 Binary Floating Point Standard
  • With the floating point arithmetic not only are the rounded results as accurate as possible but it is feasible to compute guaranteed bounds on the errors using the special directed rounding modes. Also because of the high accuracy of Real (32 bits) computations and the availability of Longreal (64 bits) to back them up at crucial points, it will be possible to run many more programs in Real precision instead of automatically using Longreal everywhere.
  • the two formats are closely related; the distinguishing characteristics being the exponent range (defined by the parameter b) and the fraction precision.
  • the Real format has 24 bits of precision (about 7 digits) with a range of approximately 10**(-38) to 10**(38).
  • the Longreal format has a much wider range--about 10**(-30B) to 10**(308)--and more than twice the precision of Real at 53 bits or about 15 digits.
  • Longreal besides being a powerful standalone computational format, makes an excellent backup facility for Real calculations at points in a program where the results are very sensitive to roundoff error.
  • the floating point architecture of the processor implemented in accordance with the principles of the present invention includes much more than the data representations. All of the IEEE Standard requirements are either met in the hardware or are facilitated in software. Among these requirements is the provision of rounding modes. In the Program Status (PS) register are two bits that control the rounding mode in effect. The modes are:
  • the floating point architecture also provides all the standard instructions for all formats: add, subtract, multiply, divide, compare and conversion. But in addition there are some unusual but crucial instructions. Square root is correctly rounded and as fast as divide. Remainder is an exact operation and permits argument reduction for periodic functions with no roundoff error.
  • the 16 General registers (128) are labeled RO to R15. They are 32 bits wide and are used for data and addresses. They are consistently symmetrical with no special designations or uses for any of them. When integer data shorter than 32 bits is moved to a General register it is sign-extended to 32 bits. When data longer than 32 bits are stored in the registers, the low order part of the data goes in the designated register, Ri, and the high order part resides in Ri+1. The numbers "wrap around" so that if a Longreal is moved to R15 the high order section is found in RO.
  • each of the 11 input and output ports (48), shown in FIGURE 5 is an independent Direct Memory Access (DMA) channel and has two 32 bit registers: an address register and a count register.
  • the address register contains a pointer to the least significant byte of the next halfword to be transferred. If it is an output port the data is moved from memory out to the port. If it is an input port the data is moved to memory that has been received from the output port of the sending processor. In both cases the count register is set to indicate the number of bytes to be sent or received. As data is sent or received, the appropriate address and count registers are incremented and decremented respectively by the number of bytes transferred. When the count reaches zero the ready flag in the Input or Output Status register (see below) is set and an interrupt is generated if an interrupt has been enabled.
  • DMA Direct Memory Access
  • the DMA channels operate independently of instruction processing. They begin functioning whenever a count register is set to a nonzero value. All of the ports are general except one input and one output port are designated “host" (H) and are normally used to communicate over the I/O bus to the System Control Boards.
  • host host
  • the Processor registers are the third type of register in the processor. All Processor registers are 32 bits wide. They contain all the special purpose and miscellaneous information and can only be loaded or stored by the Load Processor Register (LDPR) and Store Processor Register (STPR) instructions, respectively. These registers are labeled PO to Pll but they also have unique names that denote their purpose:
  • LDPR Load Processor Register
  • STPR Store Processor Register
  • C-- Carry is set on integer operations when there is a carry out of the most significant position. It is also set by floating point instructions and integer multiply and divide to indicate that the result is negative. This allows the use of the Unsigned Branches to implement the "unordered" branches required by the IEEE Floating Point Standard.
  • V-- Integer Overflow is set when the integer result is too large in magnitude for the format.
  • Z -- The Zero flag is set when the integer or floating point result is zero.
  • N -- Negative is set when the integer or floating point result is negative. If there is an Integer Overflow the.Negative flag will not agree with the sign bit of the stored result because the Negative flag is set according to the actual result before Overflow is determined.
  • the Not Comparable flag is set when floating point values are compared and one or both of the operands is Not-a-number (Nan).
  • FLOATING POINT EXCEPTIONS - The indicated flag is set when the associated exception occurs and if not disabled the corresponding interrupt is generated. (In present embodiment of the invention only the Inexact Result interrupt can be disabled.
  • the exceptions are defined in Section 4.5.
  • Timeout Enable (if this flag is zero the interrupt that would be generated by a zero value in the Timeout Register is suppressed.)
  • REP -- Repeat Mode (this field indicates the repeat mode in effect for the instruction following one of the REPEAT operation codes.)
  • REP REG -- Repeat Register (if the repeat mode is not 00 then every time the instruction the following repeat-type operation code is executed the value in REG is decremented; REG can be any of the General registers.)
  • Fault Register P2, FR: When the processor takes an interrupt generated by an exception this register contains information to aid recovery. The format of the Fault Register is shown below.
  • the Guard, Round and Sticky bits are the hardware bits that are used for rounding in floating point operations as defined in the IEEE Binary Floating Point Standard.
  • the Fraction, Exponent, Invalid and Sign bits for each operand allow an interrupt handler to determine if the operand is Nan, infinity, denormal, zero or "ordinary" (valid, nonzero) and its sign without decoding the instruction.
  • Configuration Register (P3, CR): This register is used to set various configuration parameters including the Model Number which is a Read-Only field.
  • the format of the CR is:
  • the processor has a powerful vectored interrupt facility and generates several kinds of interrupts: program exceptions, software facilities, I/O signals and hardware errors.
  • the program exceptions include integer overflow and zero divide, the floating point exceptions, stack overflow and address and reserved opcode faults.
  • the software facility interrupts are trap, breakpoint and trace.
  • the Input Ready, Output Ready, Input Parity and Input Overrun interrupts are the I/O signals.
  • the hardware errors are Corrected and Uncorrectable memory errors and Processor Self Test errors.
  • interrupts including the TRAP and breakpoint (BKPT) instructions
  • BKPT breakpoint
  • Each vector is eight bytes; the first four bytes contain the absolute address vector (VA) of the interrupt handing routine and the next four bytes are a new Program Status (NPS) value.
  • PC Program Counter
  • PS Program Status
  • the processor pushes the address of the offending instruction ("previous PC") on the stack so that the exception handler can decode the instruction.
  • Previous PC the address of the offending instruction
  • the interrupt handler executes a Return from Interrupt (REI) instruction that pops the old PS and PC values off the stack and into their respective registers.
  • REI Return from Interrupt
  • a TRAP instruction with the appropriate number as its argument can simulate any interrupt (except that the PC is always pushed on the stack with TRAP regardless of its argument).
  • the interrupt can be disabled but in either case the result stored in the destination is the low order part of the result (in divide it is the dividend).
  • IZ Integer Zero Divide--when the denominator of an integer divide or remainder is zero this interrupt is generated and no result is stored.
  • IX Inexact Result--when a real result must be rounded the flag is set in the PS and if not disabled the interrupt is generated. In either case the correctly rounded result is first stored at the destination. Inexact Result may occur at the same time as either Overflow or Underflow. If this occurs the Inexact flag is set but the interrupt is suppressed and either the Overflow or the Underflow interrupt is generated.
  • FZ Floating Zero Divide--when division of a nonzero real number by zero is attempted no result is stored, the FZ flag is set in the PS and an interrupt is generated.
  • UE Uncorrectable Memory (ECC) Error--if a memory error occurs that cannot be corrected this interrupt is generated. Since this could occur at many points during the execution of an instruction, the state of the machine is undefined after this error. If this error recurs before the previous one is handled then the internal ERROR flag is set and the ERROR pin is set high. This is to warn of a potentially fatal condition.
  • ECC Uncorrectable Memory
  • ORH Output Ready Host--this is the interrupt that is used with the output port that is normally used for communicating with the host (i.e. the various interface boards).
  • IR Input Ready--these are the interrupts used to signal that an input channel is ready to receive a message.
  • IRH Input Ready Host--this interrupt is used with the input channel that is usually used for communicating with the host.
  • IEH Input Error Host--if an error is detected on the channel used for host communication this interrupt is generated. 4.5.2 Error Flag
  • Error flag is tied to the Error pin that indicates that the processor is in an unknown, inconsistent or failure state.
  • EROF software
  • the Error flag and pin can also be set and reset by the ERON and EROF instructions respectively.
  • DMA direct memory access
  • Ports 31 and 63 are normally used for communicating with the Host (on any System Control Board). Ports 0 to 9 and 32 to 41 are used to build the hypercube interconnection network. Numbers 10 to 30 and 42 to 62 are reserved for future expansion.
  • Each of the I/O channels has an address register, a count register, a "ready" flag and an interrupt enable flag.
  • each input channel has a parity error flag, an overrun error flag and a "DMA pending" flag.
  • the enable for each channel there are two global enable flags in the Program Status (PS) register.
  • PS Program Status
  • the corresponding count register In order to start the automatic message output, the corresponding count register must be set to the number of bytes in the message. (In this version of the processor the low order bit is forced to zero in both the address and the count registers; thus the message buffer must start on an even byte boundary and be an even number of bytes long. No error is signaled if a program violates this requirement.)
  • This is done by executing a LCNT (Load Count) instruction.
  • the destination operand indicates the register to be loaded as explained above for the LPTR instruction and the source operand is the count value (an unsigned 32 bit integer).
  • the LCNT instruction also resets the parity and overrun error flags when setting up an input port.
  • the message transmission is automatic and as data is sent the address register is incremented and the count is decremented by the number of bytes transferred. When the count becomes zero the output stops, the ready flag is set and if enabled the ready interrupt is generated.
  • the processor In addition to sending a message on a single channel, the processor has a powerful BROADCAST facility. In order to send a message over several channels at once, one must first ensure that the desired output channels are ready. Then a BPTR (Broadcast Pointer) instruction is executed. Its source operand is the address of the message as in LPTR but its destination operand is a 32 bit mask. Every bit position that is set to one will cause the corresponding output channel address register to be loaded.
  • BPTR Broadcast Pointer
  • the message broadcast is started by executing a BCNT (Broadcast Count) instruction whose destination operand is a mask as explained above for the BPTR instruction and whose source operand is an unsigned 32 bit integer equal to the number of bytes in the message.
  • BCNT Broadcast Count
  • the major advantage of broadcasting is that the sending processor only has to access each transmitted datum once thus reducing the memory bandwidth used by the DMA facility. The processor can only handle one broadcast at a time so if a subsequent broadcast is attempted, even on different channels, before the current one is finished the results will be undefined.
  • the corresponding input channel of the receiving processor In order for a message to be transmitted successfully the corresponding input channel of the receiving processor must first be set up with an address to an input buffer and the same count as the output channel.
  • the processor recognizes two types of errors in communication. Each halfword is sent with a parity bit and on reception a parity check is made. Also if a halfword is received into a DMA channel before the previous one is stored in memory an input overrun error is detected. (Overrun can occur when the input count goes to zero before the output count--a software error, or when too many messages are being sent to the processor at the same time.) If either type of error occurs the corresponding flag is set and when the input count reaches zero instead of "ready", an "input error” interrupt is generated (if II is set). A software error that is not detected by the processor occurs when the output count is smaller than the input. In that case, after the message is sent the input channel will simply hang. This condition can be avoided by correct software or by setting up timeout conditions using the Timeout Register.
  • the processor is designed to be as simple and symmetric as possible. Most instructions work on all supported data types; the General registers are interchangeable in all operations; all address modes work with all instructions including Branches.
  • An instruction consists of an operation code (opcode) followed by zero, one or two address fields. The representation of a two address instruction in memory is illustrated below:
  • REFERENCE 2 is both one of the operands and the result. For example, if the OPCODE indicated Subtract then the operation performed would be:
  • All opcodes are one byte long and each operation type group has at least one reserved code for future expansion.
  • the byte is divided into two fields of four bits each.
  • the first field, TP specifies the length and type of the operands (e.g. 8 bit integer, 32 bit real) and the second field, OP, determines the operation and number of operands (e.g. Add--2 operands, Call--one operand).
  • TP specifies the length and type of the operands (e.g. 8 bit integer, 32 bit real)
  • the second field, OP determines the operation and number of operands (e.g. Add--2 operands, Call--one operand).
  • Each of the operations is described in detail in chapter 4.8 but most are evident from their name in the opcode table below.
  • the first field is represented horizontally with the even values above the odd values.
  • the second field is displayed vertically and is repeated twice.
  • the Opcode Map illustrates a number of symmetries that are explained in the table below.
  • the address fields always have at least one byte.
  • the first byte called the Mode Specifier, encodes the addressing mode and for most of the instructions the first four bits specify the general register to be used in the address evaluation while the next four bits indicate the mode.
  • the format is as shown below:
  • the assembler will chose the shortest reference form possible.
  • the addressing modes are described in detail below. First note the following:
  • the encoding for literal includes modes 0,1,2,3, there are six bits for the definition of the literal value.
  • the six bits are treated as a standard 2's complement integer between -32 and +31.
  • the instruction indicates that the literal is a real value, the integer value is converted implicitly (without round off error) to the equivalent floating point value.
  • the operand is contained in the indicated register.
  • the value is interpreted according to the instruction: real for floating point instructions, integer for integer operations and bit string for logical instructions. If a longreal operand is expected the low order part is in Rn and the high order part in Rn+1. When a byte or halfword is moved to a register it is sign-extended.
  • the indicated register contains the address of the low order byte of the operand.
  • the indicated register is decremented by the length in bytes of the operand and then the contents becomes the address of the operand.
  • This mode can be used to build a software stack or to access consecutive array elements.
  • the data addressed by Rn is first accessed and then Rn is incremented by the number of bytes in the operand. This mode is used to step through arrays and, with Autodecrement, to build software stacks.
  • the register Rn points to a 32 bit value that is the address of the operand. After the operand is accessed Rn is incremented by four, since addresses are four bytes long.
  • Rn The contents of Rn are added to the offset (in this mode only a 32 bit offset is allowed) and the 32 bit value at that address is the address of the operand.
  • This mode is also available with either PC or SP instead of a general register (see below).
  • the address is calculated by adding the address of the instruction (the value of PC before the current instruction is executed) to the sign-extended value of the offset which can be a byte, halfword or word.
  • This mode is used to access operands relative to PC and with branch instructions to jump relative to PC. (The Literal mode with branch instructions also is relative to PC.) This permits compiling position independent code.
  • the address of the instruction (the contents of PC) is added to the word offset and the 32 bit value at that address is the address of the operand.
  • the address is calculated by adding the SP and the sign extended offset.
  • the offset can be a byte, halfword, or word. This mode is often used to access local variables in an activation record on the stack.
  • the SP and the word offset are added together and the 32 bit value at that address is the address of the operand.
  • the address is the unsigned value of the offset (byte, halfword or word depending on the mode) that follows the mode specifier.
  • the word that follows the mode specifier points to a 32 bit value that is the address of the operand.
  • the operand follows the mode specifier For arithmetic and logical operators the length and type of the value is indicated by the instruction. Thus, ADDB (Add Byte) will assume an 8 bit signed integer while MULL (Multiply Longreal) will expect to find a 64 bit floating point operand as the "value”. An immediate operand used with a branch or move address instruction causes an invalid operand fault. If this mode is used as the destination (the second address in a two address instruction) an Operand error is signaled. When this mode is the first specifier it takes the operand from the top of the stack and then increments ("pops") SP by the length of the operand.
  • ADDB Address Byte
  • MULL Multiply Longreal
  • ADDR SP (mem) will use a 32 bit real value from the top of the stack as the first operand, pop the stack and store the result at "mem".
  • a MDVH SP (mem) will move the halfword on the top of the stack to "mem” and pop the stack.
  • MOVR mem,SP will decrement SP by four (the length of the operand) and move the real value at "mem" to the top of the stack.
  • both specifiers When this mode is used in both specifiers then the classical stack operations result: both operands are popped off the stack, the operation performed and the result is pushed back on the stack. In the case of Divide and Subtract the operand at the top of the stack is the dividend and subtrahend respectively. If both specifiers are SP for a Move instruction, only the flags are affected.
  • the instructions are listed alphabetically (by mnemonic) and are grouped according to operation (e.g. all the ADD instructions are grouped together).
  • the memory format of all of the instructions is shown below.
  • the source and destination specifiers are optional. While most instructions have two addresses, there are a few with zero or one address.
  • the source (src) address is always evaluated first and all addressing operations (e.g. autodecrement) are performed before the destination (dsrc, des) address is evaluated.
  • dsrc refers to the operand before the operation is performed and "des” refers to the contents of that address after the operation.
  • stack addressing modes where the SP at the beginning of the instruction is always used. Any addressing mode that refers to the PC (or SP) uses the value of the PC (or SP) at the beginning of the instruction.
  • the source operand is never changed except when using the stack addressing mode. If an instruction with byte or halfword operands references a general register, the high order part of the data is ignored if it is a source and if it is a destination the high order part is sign extended.
  • the result stored at the destination of a floating point instruction is described below.
  • the result is stored before the exception is signaled by an interrupt (except for Zero Divide and Invalid).
  • Negative (N) Flag is always set according to the sign of the correct result. Thus on integer overflow, the destination may appear positive even when N indictes negative.
  • a processor can be initialized by either asserting the reset pin or by executing a RSET instruction.
  • the resulting initialization is significantly different in the two cases. They are both described below.
  • Hardware initialization is done by asserting the reset pin and proceeds in several steps:
  • the Monitor is a simple, single user system that is in effect when the system is powered on.
  • the Monitor uses terminal 0 and provides extensive diagnostic and management functions.
  • the Operating System IXTM (IX is a trademark of NCUBE Corporation), is automatically invoked if the system is in Normal mode and passes the diagnostic tests.
  • IXTM is a fully protected multiuser, multitasking operating system with complete resource management including memory, main array, graphics and file system.
  • the file system has a hierarchical structure and is distributed across all the disk drives in the system. Thus, a user can access his files regardless of which terminal (or Peripheral Controller) he uses.
  • the IX System is described in section 5.3. 5.2 The Monitor 5.2.1 Introduction
  • the Monitor is contained in the system EPROM and is invoked when the system is powered on.
  • the Monitor always communicates with Terminal 0 on Peripheral Controller 0 (the System Console) for displaying messages and receiving commands.
  • Terminal 0 on Peripheral Controller 0 the System Console
  • the Monitor runs the diagnostics and boots the Operating System (if the diagnostics run successfully). If the mode switch is set to "Diagnostic", the Monitor goes into a single user system after successfully running the diagnostics.
  • the Monitor system provides a large range of offline diagnostic and backup facilities.
  • the Monitor consists of two parts: the ROM Monitor and the RAM Monitor. They are both in the system EPROM but the ROM Monitor uses no RAM even for stack space while the RAM Monitor, when invoked, is copied to RAM and uses RAM for data.
  • the ROM Monitor starts the system and executes the diagnostics up to the memory test phase. If memory test passes, the RAM Monitor is automatically invoked; but if it fails, the system stays in the ROM Monitor and a few simple commands are available (see 5.2.3). 5.2.2 Monitor Diagnostics
  • ADDR consists of two 4 digit hexidecimal numbers separated by a colon. The first number is the segment selector and the second is the offset.
  • SEG MAX is the number of 64 Kbyte segments of memory to be tested (starting from memory address 0).
  • a section of memory from ADDR to ADDR+LENGTH-1 is displayed in the following format:
  • ADDR is the beginning address
  • "hhhh” represents a 16 bit word in hex and the ASCII equivalent of the 8 words is also displayed (".” represents unprintable characters). goto ram monitor
  • the RAM Monitor is booted.
  • VALUE is written to I/O address IDADDR.
  • a "line feed” repeats the command at the same address but allows a different VALUE to be typed and “return” terminates it.
  • the system is powered down.
  • the RAM Monitor is invoked automatically if the diagnostics pass the memory test or explicitly by typing "g" in response to the ROM Monitor prompt.
  • the RAM Monitor Commands are of four types: general, debugging, disk control or tape control.
  • the general commands are invoked by typing the first letter of the command name.
  • the debugging, disk control and tape control commands are invoked by first typing "y", "x” or " t " respectively, followed by the first letter of the specific command name. If “return” is the first character typed, a new monitor prompt, ">", is printed and the command analyzer is restarted.
  • a " ⁇ c" can be typed at any time and regardless of what is happening, it will be aborted and a new prompt will be displayed.
  • the operand specifications are the same as the ROM Monitor's (see 5.2.3) but with several additions.
  • the operating system, IXTM is a high performance UNIX- style interface to the hardware. It supports multiple users, including password and billing, and multitasking.
  • the editor, NMACS is screen oriented and is similar to a simplified version of EMACS.
  • the file system is the most prominent feature of the operating software because nearly every system resource is treated as a type of file.
  • the file system is hierarchical like UNIX but has extensive mechanisms for file protection and sharing.
  • the operating system treats memory as a collection of segments that can be allocated and shared. Processes are created and scheduled (priority, round robin) by the system and provide part of the protection facility. There is a debugger and a linking loader.
  • One of the unique facilities of the IXTM system is the management of the main processing array.
  • the file system is the user's uniform interface to almost all of the system resources.
  • the two main entities in the file system are directories which provide the structure and files which contain the data.
  • Most resources e.g. printers, terminals, processing array
  • a file has a name which both uniquely identifies it and indicates its position in the file structure.
  • Files have a set of operations defined that can be performed by a user having the requisite privileges.
  • the third editor is a screen editor called “nm” (NMACS). It is similar to the widely used screen editor EMACS.
  • the system of the present invention provides a segmented virtual memory environment.
  • the virtual address space is 230 bytes.
  • Main memory is treated as a set of segments on 256 byte boundaries.
  • the operating system provides allocation, deallocation, extension (segments can grow to 64 Kbytes), compaction and swapping functions.
  • the system relies on the Intel 80286 memory management hardware. Memory is allocated and deallocated with the system call "core".
  • Processes are managed by the operating system as the fundamental units of computation. They are created, scheduled, dispatched and killed by the system in a uniform way for all processes.
  • MCP Master Control Program
  • MCP Master Control Program
  • the MCP Whenever a user logs on the system, the MCP checks his name and password. If he is an authorized user and the password is correct, the MCP creates a process for him. The parameters of the process are taken from his "log on" file that is created by the system administrator. These parameters include the priority, the initial program (usually the shell), the preface (user's root directory) and billing information.
  • the logon file for "userl” is named /sys/acct/userl.
  • a process is represented by a data structure in memory. This structure, called a process object, has the following entries:
  • code and data These entries point to the code and data for the process program.
  • the process management system implements preemptive, priority, round robin scheduling.
  • the system treats almost all resources as devices which are simply a special type of file.
  • the devices include disk drives, tape drives, printers, graphics hardware, interboard bus, SBX interfaces and the hypercube array.
  • Devices are managed as are other files with open, close, read and write calls. For special operations that do not fall easily in those categories, the operating system supports a "special operation" call. These special operations are things such as setting terminal parameters and printer fonts.
  • the system treats the hypercube array as a device type file. Consequently, it is allocated with an "open” command, deallocated with “close” and messages are sent and received with “write” and “read” respectively.
  • One of the powerful features of the hypercube is that it is defined recursively and so all orders of cube are logically equivalent.
  • the user specifies in the "open” call the subcube order (N) he needs. If a subcube of that order is available, it is initialized and the nodes are numbered from 0 to 2 N-1 .
  • the subcube is allocated as close as possible to the Peripheral Controller that the user's terminal is connected to. If no subcube of that size is available, the "open” returns an error condition.
  • the graphics boards are also treated as device files and are allocated and managed by each user with file system calls.
  • the special operations that are defined for the graphics devices are the graphics operations that the hardware itself supports such as line and circle drawing, fill-in, panning, etc.
  • Each System Control board in a system has three SBX connectors. One is used for the cartridge tape controller and another is dedicated to providing the Interboard Bus (a bus for moving data between Peripheral Controllers). The last SBX connector is available for custom parallel I/O applications. There are many potential uses for the SBX Interface including networking, 9 track tape drive controller, etc. Regardless of what it is used for, it will be treated as a device by the operating system. Consequently, it is only necessary to write the appropriate device driver in order to use the standard file system calls for device management.
  • the first level initialization is accomplished by simply turning on the system in Normal mode. When the operating system is booted, it looks for a configuration file called /sys/startup
  • startup file exits, a shell is created that runs it as a command file.
  • a command file One example of a command that would very likely be found in the startup file is /sys/bin/spool > /syslspool.log & which causes the print spooler to be run as a parallel process.
  • system administrator must perform certain functions such as creating logon files for each user.
  • the hypercube array In addition to initializing the operating system, the hypercube array must be initialized.
  • the initialization of individual processors is discussed in section 4.9. In this section an algorithm for initializing the system is described. The algorithm is based on a tree structure and can be more easily illustrated than described. The diagram below shows the initialization responsibility for each processor assuming there are 16 processors. The binary numbers are the processor ID's and the decimal numbers represent the stage in time of the initialization.
  • the file system maintains a cache of buffers for disk sectors to minimize the actual disk traffic.
  • the number of buffers is set by the system variable "caccnt".
  • the buffers are arranged in a linked list with a system variable "lruptr” pointing to the least recently used buffer.
  • the entries in the sector buffer cache table (which is located at "cactab") are called sector buffer descriptors ("secbufdes”) and are specified below.
  • a directory contains pointers to files or other directories.
  • the first name in every directory is ".” and refers to itself. Names of files and directories can have at most 24 characters from the set (a-z,0-9,$,_,.).
  • a directory is made of one or more directory sectors ("dirsec").
  • a directory sector contains up to 32 entries, each of which is 32 bytes. The first entry contains defining information about the directory. The rest of the entries, called directory pointers ("dirptr”), are pointers to files or other directories.
  • the structure of directory sectors and directory pointers are specified below.
  • a data file consists of one file descriptor sector ("fildessec") and as many file pointer sectors ("filptrsec”) as necessary.
  • a file descriptor sector contains a 32 byte header and up to 248 pointers to sectors containing data.
  • a file pointer sector contains a 12 byte header and up to 252 pointers to data.
  • the call returns the index, called the channel number or "fildes", of the descriptor in the open file table. Thereafter, file operations refer to the file through this channel number. There can be up to ??? open files in a process at one time.
  • An open file table consists entirely of open file descriptors so it suffices to specify the format of the descriptors.
  • Each process in the system is represented by a data structure called a process object.
  • the process object is represented by four entries in the Global Descriptor Table (GDT). These four entries are collectively called the process descriptor and all process descriptors are chained together.
  • the first entry is an "invalid" segment descriptor that contains process information: a link to the next process descriptor in the chain, process id, priority and status.
  • the other three entries are valid segment descriptors.
  • the processor descriptor and process object formats are defined below.
  • the process statistics can be used for billing and are accessed with the "getpcs" system call.
  • the process parameters are set at process creation from the log on file or they are inherited from the creating process.
  • the open file table contains open file descriptors. Whenever a file is opened a descriptor for it is entered in this table and its index (channel number) is returned. Whenever a new segment is allocated to a process, a descriptor for it is entered in the LDT.
  • system state last process a pointer to the last dispatched process; it is used to implement round robin scheduling process object: the segment selector for the current process object time left: the number of clock ticks left in the time slice of the current process.
  • the system data (“sysdata”) defines the parameters, statistics and variables of the system.
  • the data can be read by invoking the system call "getsys”.
  • the format of the system data is given below.
  • Each device supported by the operating system has a set of device drivers to support it. These routines are accessed through call tables that are indexed by the a unique number for each device (see below).
  • the basic system supports the disks, 8530's, 8259's, 8254, array interface, printer and an sbx connected to a 3M tape drive. Additional drivers may be added if other sbx interfaces are installed in the system.
  • the device calls are standardized for all devices. They are: init, open, read, write, alloc, special and seek.
  • the device index definitions for the system are 14: tty6
  • nucleus There is a small nucleus that runs in each node of the hypercube array.
  • the main function of the nucleus is to provide communication and synchronization facilities.
  • debugger and program loader and scheduler there is also a simple debugger and a program loader and scheduler.
  • the underlying system level handshaking and buffer management breaks a message up into small blocks and sends (receives) one block at a time. For messages that must be routed through more than one node, this is much more efficient than trying to handle the whole message at once. Also, it prevents a "waiting for buffer" type of deadlock.
  • a node In response to messages from the Peripheral Controller that is managing the subcube, a node can set breakpoints and read and set memory and registers.
  • the node nucleus has system calls that, in response to messages from the Peripheral Controller currently managing the node, allows a node to load a program and its data and schedule it for execution.
  • This section specifies the calls that a program running in a node can make on the nucleus. It also shows how a program running in the Peripheral Controller that is managing a node can, through sending and receiving messages, access some of the system calls.
  • the list of system calls includes
  • the assembly language code that implements this algorithm is:
  • the hypercube interconnection system was chosen for three main reasons:
  • a gray code is a one-to-one mapping between integers such that the binary representations of the images of any two consecutive integers differ in exactly one place.
  • the domain is assumed to be finite and the largest and smallest integers in the domain are "consecutive".
  • One example of a gray code for three bit integers is
  • gray code is important by realizing that in a hypercube, if processor x is connected to processor y, then the binary representations of x and y must differ in exactly one place.
  • processor x is connected to processor y, then the binary representations of x and y must differ in exactly one place.
  • Gray code implemented with the following algorithm:
  • a one dimensional grid (or ring) is simply a string of interconnected processors as shown
  • Non-realtime filtering is an example.
  • the mapping in this case is simply any gray code as described in section 7.2.1.
  • F is the gray code and G is its inverse then the neighbors of processor x are:
  • Steady state problems involving two space dimensions map naturally onto a two dimensional grid.
  • boundary value problems e.g. boundary value problems
  • a three dimensional mapping is analogous to the two dimensional case except the processor ID numbers are divided into three parts instead of two and there are six neighbors instead of four.
  • processor ID number is divided into four parts and each processor has eight neighbors.
  • Simultaneous linear equation problems are categorized according to the structure of the matrix representing the problem.
  • the two main types are:
  • the method involves computing factors of a matrix so that the problem is solvable in two steps. For example, suppose we want to solve where A is the matrix of coefficients, x is the unknown and b is the known vector. In the factorization methods we compute where C and D have some special structure (orthogonal or triangular). Then the equations are solved by computing y and then x as shown
  • the following user program computes L and U using partial pivoting with actual row interchanges.
  • the elements of L and U replace the elements of A.
  • FIGURE 1 is a diagram of a multiprocessing system in which the present invention is embodied.
  • a clock board (10), a number (1 to k) of processor array boards (12), and a number (1 to x) of system control boards (14), are plugged into slots (J1-J24) in a backplane (16).
  • processor array boards (12) is shown in more detail in FIGURE 3, and is described in Section 8.1.
  • system control boards (14) is shown in FIGURE 12, and is described in Section 8.9.
  • the other 5 I/O channels are brought to the edge of the board for access to the backplane. 4 of these 5 channels are routed via backplane interconnections to other array boards to build larger hypercubes as described in Section 8.2 below.
  • each processing node is connected to one of the eight I/O slots in the backplane which receive eight system control boards.
  • each one of the eight system control boards (14) in the I/O slots of FIGURE 1 is able to communicate directly with up to 128 processing nodes.
  • FIGURE 4 One of the 64 processing nodes on the processor array board of FIGURE 3 is shown in FIGURE 4.
  • the wiring utilizes 6 (n) of the 10 (p) serial interconnect channels to effect the interconnections among the nodes.
  • the NcubeTM processor block (30) of FIGURE 4 is shown in more detail in FIGURE 5, and is comprised of Floating Point Unit (40), Address Unit and Instruction Cache (42), Instruction Decoder (44), Integer Execution Unit (46), I/O Ports (48), and Memory Interface (50), which are attached to either or both of a common address bus (52), and data bus (54). These units are described in sections 8.3 through 8.8 below.
  • FIGURE 2a is a detailed diagram of the arrangement of the serial communications interconnect on the backplane of the multiprocessing system shown in FIGURE 1.
  • Processor array boards are inserted into one or more of the 16 slots 0 through F to form hypercube structures according to the following list:
  • the backplane wiring routes signal lines to connect groups of boards together as shown in FIGURE 2A.
  • an order 7 hypercube is achieved by inserting 2 boards in slots 0 and 1, or 2 and 3, or 4 and 5, etc.
  • An order 8 hypercube is achieved by inserting 4 boards in slots 0 through 3 or 4 through 7, etc..
  • An order 9 hypercube is achieved by inserting 8 boards in slots 0 through 7 or 8 through 15.
  • An order 10 hypercube is achieved by inserting 16 boards in slots 0 through 15.
  • the I/O interconnect wires are shown at the bottom of FIGURE 2A.
  • Each line includes 128 I/O channels which are connected from a system control board in an I/O slot and fan out to up to 8 processor array boards, 16 channels going to one of the 8 boards.
  • Each one of the 16 channels go to the host serial channel (34), FIGURE 4, on a processing node. Since there are a total of 64 such nodes on a processor array board, four system control boards in I/O slots 0 through 3 of FIGURE 2A provide the 64 channels on each processor array board in array board slots 0-7, and four system control boards in I/O slots 4-7 of FIGURE 2A provide the 64 channels on each processor array board in array board slots 8-15.
  • FIGURE 2b is a detailed diagram of the system control interconnect on the backplane of the multiprocessing system shown in FIGURE 1.
  • the control lines include system reset lines, clock lines, and array error lines.
  • the clock board (10) of FIGURE 1 is inserted in a slot between slots J12 and J13. 8.3 Floating Point Unit
  • the floating point unit (40) shown in FIGURE 5 is comprised of four input operand registers (56) which receive data from the data bus (54).
  • the operand select MUX (58) selects, from the appropriate input operand register, the sign and exponent portion and the significand portion.
  • the sign and exponent portion is delivered to the sign and exponent logic (60).
  • the significand portion is delivered to the significand logic (62).
  • the logic blocks (60, 62) perform the floating point arithmetic specified by the instruction definition in Section 4.8.
  • the sign and exponent logic (60) and the significand logic (62) outputs are connected to the operand register (64) which returns the data to the data bus (54).
  • FIGURE 7 is a detailed block diagram of the address unit and instruction cache (42) shown in FIGURE 5.
  • the refresh address register (100) contains a pointer to memory which is the value of the address in memory which is to be refreshed next. After each refresh cycle is taken, this pointer is incremented.
  • the Stack Pointer Register (102) contains a pointer which points to the top of the stack. The stack pointer register is described in Section 4.2.2 above, under General Registers.
  • the operand address register (104) is an internal register to which computed effective addresses are transferred before a memory cycle is performed. The operand address register is connected to the address bus.
  • the program counter (106) points to the next instruction to be executed. It is incremented the appropriate number of bytes after the instruction is executed. It is also affected by call, return, and branch instructions which change the execution flow.
  • the program counter is connected to the instruction fetch address register (108) which is a pointer into the memory location from which instructions are currently being fetched. These instructions are loaded into the instruction cache (114).
  • the instruction cache allows for fetching several instructions ahead of the instruction that is being executed.
  • the shadow ROM (110) is described in Section 4.9. It contains instructions that are executed prior to the transfer of control to user code upon system initialization.
  • the instruction cache provides a buffer for data after prefetch and before the actual execution of the stored instruction. It also provides some retention of the data after it has been executed. If a branch is taken back to a previous instruction for reexecution, and if that previous instruction is in within 16 bytes of the currently executing instruction, the data corresponding to that previous instruction will still be stored in the cache. Thus, a memory fetch cycle will not have to be taken.
  • the instruction cache is both a look-ahead and look-behind buffer.
  • the MUX (112) is a multiplexer that multiplexes between instructions coming from the shadow ROM or coming from memory after initialization.
  • the instruction decoder (44) shown in FIGURE 5 receives an instruction stream from the instruction cache of FIGURE 7.
  • the instruction decoder includes an opcode PLA (101) which decodes static information in connection with the opcode of an instruction, such as number of operands, type of operands, whether the instruction is going to take a single cycle to execute or many cycles to execute, and what unit the instruction is going to execute in (the instruction execution unit or the floating point unit). This information is latched in the opcode latch (103). The operand itself is latched into the operand latch (105).
  • the operand sequencer PLA (107) is a state machine whose main function is to supervise the decoding of operands.
  • the operand decode PLA (109) is a state. machine whose main function is to compute effective addresses for each of the addressing modes and to supervise the execution of instructions.
  • the execute PLA (111) is a state machine whose main function is to execute the instruction in conformance with the definition of instructions as given in Section 4.8 above.
  • the Processor Status Register (126) contains flags, interrupt controls and other status information.
  • the Fault Register (124) stores the fault codes.
  • the Configuration Register (120) stores the model number (read only) and the memory interface parameters.
  • the Processor Identification register (122) contains a number that identifies the processor's location in the array.
  • the Timer register (116) contains a counter that is decremented approximately every 100 microseconds and generates an interrupt (if enabled) when it reaches zero.
  • the refresh timer (118) is a time-out register used to time the period between refreshes. This register is initialized from eight bits out of the configuration register and it decrements those eight bits. When the timer goes to zero, a refresh is requested.
  • the register file (128) is described in Section 4.4.1 above. It includes 16 addressable registers that are addressable by the instruction operands.
  • the temporary register (130) is an internal register used during the execution of instructions. It is connected to the integer ALU (132) which is used during the execution of integer instructions.
  • the sign extension logic (134) takes the result from the ALU block and, according to the data type of the result, extends the sign to a full 32-bit width. It also checks for conversion overflows.
  • the barrel shifter (136), the shift temporary register (134), and the shift count register (140) are used to execute the shift and rotate instructions.
  • the port select register (142) is an internal register in which the register number of the serial I/O port to be selected for the next operation is stored.
  • the control register select register (144) is an internal register in which the address of the control register to be selected for the next operation is stored.
  • the memory data register (146) is an internal register used for the temporary storage of data which is destined to be written into memory. It is an interface register between the instruction execution unit and the memory interface.
  • FIGURES 10A and lOB comprise a composite block diagram of a single I/O port representative of one of the 11 I/O ports (48) on each processor shown in FIGURE 5. Each port has all the circuitry necessary to both receive and transmit serial messages. The format of the messages is described in Section 5.4.1 above. Data are received on the serial data in line (150) and are framed in the input shift register (152). The information is then transferred in parallel to the input latch (154) and is stored there until it is transferred to the memory on the memory data in lines (156).
  • data to be transmitted is brought in from the memory data out-lines (158), stored in the output latch (160), and then transferred to the output shift register (162), and transmitted serially on the serial out line and combined with parity bits from the parity-bit generator (164).
  • the input port and the output port both contain an address pointer and a byte counter.
  • the address pointers (166, 170) point to the locations in memory where the message will be written to or read from.
  • the input and output byte counters (168, 172) are utilized to specify the length of message to be sent or received. All of these four registers are initialized by the appropriate instruction: the load address pointer instruction, and the load byte counter instruction.
  • the input address pointer (166) is incremented by two bytes and the input byte counter (168) is decremented by two bytes.
  • the output address pointer (170) is incremented by two bytes and the output byte counter (172) is decremented by two bytes.
  • the control portion of the serial port is shown in Figure 10A.
  • the parity error flag (180) is set by the input controller when there is a parity error detected on an input message.
  • the full flag (182) is set by the input controller during the time that the input latch (154) is buffering a message which has not yet been transferred into memory.
  • the overflow flag (184) is set by the input controller when the input latch is holding a message to be sent to memory and the input shift register (152) finishes receiving a second message which overrides the first message before it is transferred to memory.
  • the input enable flag (186) is a flag which is both readable and writable by the user to enable interrupts that occur when the input port becomes ready, i.e. when the byte count goes to zero.
  • the full flag (190) on the output port controller is set for the period of time when there is data in the output latch which has not been transferred to the output shift register.
  • the broadcast flag (192) is initialized by the broadcast count instruction.
  • this flag When this flag is set, it indicates that this particular output port is a member of the current broadcast group.
  • any data coming over the memory data out bus (158) for broadcasting will be transmitted out of this port and simultaneously out of all other ports that have their broadcast flags on.
  • the port interrupt logic (194) generates interrupts if enabled when the input or output ports have finished transmitting or receiving messages, as signaled by the appropriate byte counter being decremented to zero.
  • the port memory arbitration logic (196) performs the function of arbitrating for memory with all the other I/O ports. The winner of this arbitration must again arbitrate with other units on the chip in the memory interface unit described in Section 8.8. When an arbitration is successful and a memory grant is given, the memory grant line indicates that data either has been taken from the memory data in bus or that the data is available on the memory data out bus shown in Figure 10B.
  • the memory interface logic interfaces between the several internal units which need to access memory and the memory itself.
  • the memory control block (200) receives the memory request lines from the various internal parts of the chip and memory requests external to the chip via the memory request pin.
  • the memory request pin allows the Intel 20286 to request a memory cycle of a processor's memory in which case the memory interface logic performs the function of a memory controller providing the RAM control lines from the timing generator (202) while allowing the Intel 20286 to actually transfer the data in and out of the memory).
  • the memory control prioritizes these requests according to a given priority scheme and returns memory grants back to the individual requesting unit when it is that unit's turn to use the memory.
  • the memory control specifies to the timing generator when access is to begin.
  • the timing generator provides the precise sequence of RAM control lines as per the memory specifications for the particular RAM chip.
  • the memory control also specifies when the address is to be transferred from the address bus through the address latch (204) to the address pins of the memory chip.
  • the memory control also controls the transfer of information from the data collating registers (206) and the internal buses to and from which data is transferred internally.
  • the data collating registers (206) perform two functions. First, they bring many pieces of a data transfer together, for example, for a double-word transfer the registers will collate the two single words into a double word. Second, the data collating registers align the data with respect to the memory, such that if data is being written to an odd location in memory the data collating registers will skew the data to line up with memory.
  • the ECC check/generate logic (208) is used to generate the ECC during a write operation and to check for errors during a read operation.
  • the ECC syndrome decode (210) operates during a read operation to flag the bit position that is in error as determined by the ECC check logic. A single-bit error can be corrected by the error correction code and this bit position will be corrected automatically by the ECC syndrome decode logic.
  • FIGURE 12 is a detailed block diagram of the system control board (14) shown in FIGURE 1. It includes an array interface (212), shown in more detail in FIGURE 13, a 2MB System RAM (214), SMD disk drive controller (216), parallel I/O interface (218), System I/O Interface (220), CPU and Control (222), Auxiliary I/O Interface (224), and SBX and EPROM (226).
  • array interface 212
  • SMD disk drive controller 216
  • parallel I/O interface (218
  • System I/O Interface 220
  • CPU and Control 222
  • Auxiliary I/O Interface 224
  • SBX and EPROM SBX and EPROM
  • the address buffers (354) and the data buffers (356) are connected via the data lines and the buffer lines to the local RAM (352).
  • the SMD controller (216) is connected to the local memory (352) and is also connected to the system RAM (214) for the transfer of data from disk to memory.
  • FIGURE 13 is a detailed block diagram of the the dual-ported processing nodes and serial communications interconnect on the system control board array interface shown in FIGURE 12.
  • the 16 (r) dual ported processing nodes on an I/O board are therefore connected as two order 3 hypercubes.
  • FIGURE 14 is a detailed block diagram of one of the 16 dual ported-processing node of the system control board interface shown in FIGURE 13.
  • the dual-ported processing nodes use the same NCUBETM processor integrated circuit as the array processor of FIGURE 4.
  • the System Control Boards (14) of Figure 1 use the 8 I/O slots on the backplane. Through backplane wiring, these boards are allowed to access up to a 128 processor node subset of the array.
  • Each System Control Board (FIGURE 13) has 16 processing nodes (300) and each node has 8 of its I/O channels (0,1,...,7) dedicated to communicating with the Processing Array through the array interface (212).
  • each Processor board slot J1-J24 be numbered (xxxx) in binary.
  • the board in that slot contains the hypercube (xxxx:yyyyyy) where yyyyyy is a binary number that can range from 0 to 63. (i.e. the ID's of the processors on board xxxx are xxxxyyyyyy where xxxx is fixed.)
  • the following diagram illustrates the mapping between the nodes in the Main Array and the nodes on a system control board.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
EP86105879A 1985-05-06 1986-04-29 Rechnersystem mit hoher Leistung Expired - Lifetime EP0201797B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US06/731,170 US5113523A (en) 1985-05-06 1985-05-06 High performance computer system
US731170 1996-10-10

Publications (3)

Publication Number Publication Date
EP0201797A2 true EP0201797A2 (de) 1986-11-20
EP0201797A3 EP0201797A3 (en) 1988-06-01
EP0201797B1 EP0201797B1 (de) 1993-02-17

Family

ID=24938359

Family Applications (1)

Application Number Title Priority Date Filing Date
EP86105879A Expired - Lifetime EP0201797B1 (de) 1985-05-06 1986-04-29 Rechnersystem mit hoher Leistung

Country Status (3)

Country Link
US (1) US5113523A (de)
EP (1) EP0201797B1 (de)
DE (1) DE3687764T2 (de)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0206580A3 (de) * 1985-06-04 1988-05-18 Thinking Machines Corporation Verfahren Und Vorrichtung zum Verbinden von Prozessoren in einem hyperdimensionalen Netz
GB2201817A (en) * 1987-02-27 1988-09-07 Massachusetts Inst Technology Geometry-defining processors
GB2206428A (en) * 1987-06-15 1989-01-05 Texas Instruments Ltd Computer
FR2626091A1 (fr) * 1988-01-15 1989-07-21 Thomson Csf Calculateur de grande puissance et dispositif de calcul comportant une pluralite de calculateurs
EP0358704A4 (en) * 1987-04-27 1990-12-19 Thinking Machines Corporation Method and apparatus for simulating m-dimension connection networks in an n-dimension network where m is less than n
EP0386151A4 (en) * 1987-11-10 1992-08-05 Echelon Systems Multiprocessor intelligent cell for a network which provides sensing, bidirectional communications and control
US10069674B2 (en) 2013-12-12 2018-09-04 International Business Machines Corporation Monitoring file system operations between a client computer and a file server

Families Citing this family (122)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8801116A (nl) * 1988-04-29 1989-11-16 Oce Nederland Bv Werkwijze en inrichting voor het converteren van omtrekgegevens naar rastergegevens.
US5072371A (en) * 1989-03-01 1991-12-10 The United States Of America As Represented By The United States Department Of Energy Method for simultaneous overlapped communications between neighboring processors in a multiple
US6408346B1 (en) * 1989-11-03 2002-06-18 Compaq Computer Corporation System for communicating with an external device using a parallel port with DMA capabilities and for developing a signal to indicate the availability of data
EP0427407A3 (en) * 1989-11-03 1993-03-10 Compaq Computer Corporation Parallel port with direct memory access capabilities
US5191657A (en) * 1989-11-09 1993-03-02 Ast Research, Inc. Microcomputer architecture utilizing an asynchronous bus between microprocessor and industry standard synchronous bus
US5280474A (en) * 1990-01-05 1994-01-18 Maspar Computer Corporation Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays
DE69132300T2 (de) * 1990-03-12 2000-11-30 Hewlett-Packard Co., Palo Alto Durch Anwender festgelegter direkter Speicherzugriff mit Anwendung von virtuellen Adressen
US5367642A (en) * 1990-09-28 1994-11-22 Massachusetts Institute Of Technology System of express channels in an interconnection network that automatically bypasses local channel addressable nodes
JP2959104B2 (ja) * 1990-10-31 1999-10-06 日本電気株式会社 信号処理プロセッサ
US5630162A (en) * 1990-11-13 1997-05-13 International Business Machines Corporation Array processor dotted communication network based on H-DOTs
US5815723A (en) * 1990-11-13 1998-09-29 International Business Machines Corporation Picket autonomy on a SIMD machine
US5625836A (en) * 1990-11-13 1997-04-29 International Business Machines Corporation SIMD/MIMD processing memory element (PME)
US5794059A (en) * 1990-11-13 1998-08-11 International Business Machines Corporation N-dimensional modified hypercube
US5734921A (en) * 1990-11-13 1998-03-31 International Business Machines Corporation Advanced parallel array processor computer package
US5963745A (en) * 1990-11-13 1999-10-05 International Business Machines Corporation APAP I/O programmable router
US5588152A (en) * 1990-11-13 1996-12-24 International Business Machines Corporation Advanced parallel processor including advanced support hardware
US5963746A (en) * 1990-11-13 1999-10-05 International Business Machines Corporation Fully distributed processing memory element
US5828894A (en) * 1990-11-13 1998-10-27 International Business Machines Corporation Array processor having grouping of SIMD pickets
US5752067A (en) * 1990-11-13 1998-05-12 International Business Machines Corporation Fully scalable parallel processing system having asynchronous SIMD processing
US5617577A (en) * 1990-11-13 1997-04-01 International Business Machines Corporation Advanced parallel array processor I/O connection
US5765015A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Slide network for an array processor
US5809292A (en) * 1990-11-13 1998-09-15 International Business Machines Corporation Floating point for simid array machine
US5765011A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams
US5590345A (en) * 1990-11-13 1996-12-31 International Business Machines Corporation Advanced parallel array processor(APAP)
US5966528A (en) * 1990-11-13 1999-10-12 International Business Machines Corporation SIMD/MIMD array processor with vector processing
US5765012A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Controller for a SIMD/MIMD array having an instruction sequencer utilizing a canned routine library
DE69131272T2 (de) * 1990-11-13 1999-12-09 International Business Machines Corp., Armonk Paralleles Assoziativprozessor-System
US5369773A (en) * 1991-04-26 1994-11-29 Adaptive Solutions, Inc. Neural network using virtual-zero
US5594918A (en) * 1991-05-13 1997-01-14 International Business Machines Corporation Parallel computer system providing multi-ported intelligent memory
US5224485A (en) * 1991-05-28 1993-07-06 Hewlett-Packard Company Portable data acquisition unit
US5367692A (en) * 1991-05-30 1994-11-22 Thinking Machines Corporation Parallel computer system including efficient arrangement for performing communications among processing node to effect an array transposition operation
EP0523544B1 (de) * 1991-07-12 2002-02-27 Matsushita Electric Industrial Co., Ltd. Vorrichtung zur Lösung von linearen Gleichungssystem
US5361370A (en) * 1991-10-24 1994-11-01 Intel Corporation Single-instruction multiple-data processor having dual-ported local memory architecture for simultaneous data transmission on local memory ports and global port
US5359714A (en) * 1992-01-06 1994-10-25 Nicolas Avaneas Avan computer backplane-a redundant, unidirectional bus architecture
US5452401A (en) * 1992-03-31 1995-09-19 Seiko Epson Corporation Selective power-down for high performance CPU/system
JP2642039B2 (ja) * 1992-05-22 1997-08-20 インターナショナル・ビジネス・マシーンズ・コーポレイション アレイ・プロセッサ
US5428803A (en) * 1992-07-10 1995-06-27 Cray Research, Inc. Method and apparatus for a unified parallel processing architecture
US5579527A (en) * 1992-08-05 1996-11-26 David Sarnoff Research Center Apparatus for alternately activating a multiplier and a match unit
US5581778A (en) * 1992-08-05 1996-12-03 David Sarnoff Researach Center Advanced massively parallel computer using a field of the instruction to selectively enable the profiling counter to increase its value in response to the system clock
EP0654158A4 (de) * 1992-08-05 1996-03-27 Sarnoff David Res Center Massive parallelrechnervorrichtung.
US5430887A (en) * 1992-10-19 1995-07-04 General Electric Company Cube-like processor array architecture
US5630173A (en) * 1992-12-21 1997-05-13 Apple Computer, Inc. Methods and apparatus for bus access arbitration of nodes organized into acyclic directed graph by cyclic token passing and alternatively propagating request to root node and grant signal to the child node
JP2875448B2 (ja) * 1993-03-17 1999-03-31 松下電器産業株式会社 データ転送装置及びマルチプロセッサシステム
US5717947A (en) * 1993-03-31 1998-02-10 Motorola, Inc. Data processing system and method thereof
EP0632375B1 (de) * 1993-06-04 1999-02-03 Hitachi, Ltd. Verfahren zu multipler Ausführung multipler-Versionprogramme und Rechnersystem dafür
US5680536A (en) * 1994-03-25 1997-10-21 Tyuluman; Samuel A. Dual motherboard computer system
US5586289A (en) * 1994-04-15 1996-12-17 David Sarnoff Research Center, Inc. Method and apparatus for accessing local storage within a parallel processing computer
US5590356A (en) * 1994-08-23 1996-12-31 Massachusetts Institute Of Technology Mesh parallel computer architecture apparatus and associated methods
US5615127A (en) * 1994-11-30 1997-03-25 International Business Machines Corporation Parallel execution of a complex task partitioned into a plurality of entities
US5664192A (en) * 1994-12-14 1997-09-02 Motorola, Inc. Method and system for accumulating values in a computing device
US5603044A (en) * 1995-02-08 1997-02-11 International Business Machines Corporation Interconnection network for a multi-nodal data processing system which exhibits incremental scalability
JPH08286932A (ja) * 1995-04-11 1996-11-01 Hitachi Ltd ジョブの並列実行制御方法
US5805890A (en) * 1995-05-15 1998-09-08 Sun Microsystems, Inc. Parallel processing system including arrangement for establishing and using sets of processing nodes in debugging environment
US5859981A (en) * 1995-07-12 1999-01-12 Super P.C., L.L.C. Method for deadlock-free message passing in MIMD systems using routers and buffers
US5710938A (en) * 1995-07-19 1998-01-20 Unisys Corporation Data processing array in which sub-arrays are established and run independently
US5675823A (en) * 1995-08-07 1997-10-07 General Electric Company Grain structured processing architecture device and a method for processing three dimensional volume element data
US5802340A (en) * 1995-08-22 1998-09-01 International Business Machines Corporation Method and system of executing speculative store instructions in a parallel processing computer system
US6449730B2 (en) 1995-10-24 2002-09-10 Seachange Technology, Inc. Loosely coupled mass storage computer cluster
US5862312A (en) 1995-10-24 1999-01-19 Seachange Technology, Inc. Loosely coupled mass storage computer cluster
US5940859A (en) * 1995-12-19 1999-08-17 Intel Corporation Emptying packed data state during execution of packed data instructions
US5903771A (en) * 1996-01-16 1999-05-11 Alacron, Inc. Scalable multi-processor architecture for SIMD and MIMD operations
US5717884A (en) * 1996-02-02 1998-02-10 Storage Technology Corporation Method and apparatus for cache management
US5774693A (en) * 1996-02-28 1998-06-30 Kaimei Electronic Corp. Multiprocessor parallel computing device for application to the execution of a numerical simulation software program
US5892941A (en) 1997-04-29 1999-04-06 Microsoft Corporation Multiple user software debugging system
US6188874B1 (en) * 1997-06-27 2001-02-13 Lockheed Martin Corporation Control and telemetry signal communication system for geostationary satellites
US6275893B1 (en) 1998-09-14 2001-08-14 Compaq Computer Corporation Method and apparatus for providing seamless hooking and intercepting of selected kernel and HAL exported entry points in an operating system
US6393590B1 (en) * 1998-12-22 2002-05-21 Nortel Networks Limited Method and apparatus for ensuring proper functionality of a shared memory, multiprocessor system
US8174530B2 (en) * 1999-04-09 2012-05-08 Rambus Inc. Parallel date processing apparatus
US8171263B2 (en) 1999-04-09 2012-05-01 Rambus Inc. Data processing apparatus comprising an array controller for separating an instruction stream processing instructions and data transfer instructions
US20080162875A1 (en) * 1999-04-09 2008-07-03 Dave Stuttard Parallel Data Processing Apparatus
US20070242074A1 (en) * 1999-04-09 2007-10-18 Dave Stuttard Parallel data processing apparatus
US20080162874A1 (en) * 1999-04-09 2008-07-03 Dave Stuttard Parallel data processing apparatus
US7526630B2 (en) * 1999-04-09 2009-04-28 Clearspeed Technology, Plc Parallel data processing apparatus
US8762691B2 (en) 1999-04-09 2014-06-24 Rambus Inc. Memory access consolidation for SIMD processing elements using transaction identifiers
US7802079B2 (en) 1999-04-09 2010-09-21 Clearspeed Technology Limited Parallel data processing apparatus
US8169440B2 (en) 1999-04-09 2012-05-01 Rambus Inc. Parallel data processing apparatus
US20080008393A1 (en) * 1999-04-09 2008-01-10 Dave Stuttard Parallel data processing apparatus
US7966475B2 (en) 1999-04-09 2011-06-21 Rambus Inc. Parallel data processing apparatus
EP1181648A1 (de) 1999-04-09 2002-02-27 Clearspeed Technology Limited Paralleldatenverarbeitungsvorrichtung
US6973559B1 (en) * 1999-09-29 2005-12-06 Silicon Graphics, Inc. Scalable hypercube multiprocessor network for massive parallel processing
ATE390788T1 (de) 1999-10-14 2008-04-15 Bluearc Uk Ltd Vorrichtung und verfahren zur hardware-ausführung oder hardware-beschleunigung von betriebssystemfunktionen
US6684343B1 (en) * 2000-04-29 2004-01-27 Hewlett-Packard Development Company, Lp. Managing operations of a computer system having a plurality of partitions
US6725317B1 (en) * 2000-04-29 2004-04-20 Hewlett-Packard Development Company, L.P. System and method for managing a computer system having a plurality of partitions
US6615281B1 (en) 2000-05-05 2003-09-02 International Business Machines Corporation Multi-node synchronization using global timing source and interrupts following anticipatory wait state
US7076633B2 (en) * 2001-03-28 2006-07-11 Swsoft Holdings, Ltd. Hosting service providing platform system and method
US6721943B2 (en) * 2001-03-30 2004-04-13 Intel Corporation Compile-time memory coalescing for dynamic arrays
US20020169941A1 (en) * 2001-05-10 2002-11-14 Eustis Mary Susan Huhn Dynamic processing method
US20030046482A1 (en) * 2001-08-28 2003-03-06 International Business Machines Corporation Data management in flash memory
US7356641B2 (en) * 2001-08-28 2008-04-08 International Business Machines Corporation Data management in flash memory
US8041735B1 (en) 2002-11-01 2011-10-18 Bluearc Uk Limited Distributed file system and method
US7457822B1 (en) * 2002-11-01 2008-11-25 Bluearc Uk Limited Apparatus and method for hardware-based file system
US7263598B2 (en) * 2002-12-12 2007-08-28 Jack Robert Ambuel Deterministic real time hierarchical distributed computing system
US7509533B1 (en) * 2003-06-30 2009-03-24 Sun Microsystems, Inc. Methods and apparatus for testing functionality of processing devices by isolation and testing
US7512721B1 (en) 2004-05-25 2009-03-31 Qlogic, Corporation Method and apparatus for efficient determination of status from DMA lists
US8332844B1 (en) 2004-12-30 2012-12-11 Emendable Assets Limited Liability Company Root image caching and indexing for block-level distributed application management
US7721282B1 (en) * 2004-12-30 2010-05-18 Panta Systems, Inc. Block-level I/O subsystem for distributed application environment management
US7500043B2 (en) * 2005-04-22 2009-03-03 Altrix Logic, Inc. Array of data processing elements with variable precision interconnect
US7620841B2 (en) * 2006-01-19 2009-11-17 International Business Machines Corporation Re-utilizing partially failed resources as network resources
US7882380B2 (en) * 2006-04-20 2011-02-01 Nvidia Corporation Work based clock management for display sub-system
US7937606B1 (en) 2006-05-18 2011-05-03 Nvidia Corporation Shadow unit for shadowing circuit status
US8694758B2 (en) * 2007-12-27 2014-04-08 Intel Corporation Mixing instructions with different register sizes
US7870339B2 (en) * 2008-01-11 2011-01-11 International Business Machines Corporation Extract cache attribute facility and instruction therefore
US7895419B2 (en) 2008-01-11 2011-02-22 International Business Machines Corporation Rotate then operate on selected bits facility and instructions therefore
US9280480B2 (en) 2008-01-11 2016-03-08 International Business Machines Corporation Extract target cache attribute facility and instruction therefor
US7739434B2 (en) * 2008-01-11 2010-06-15 International Business Machines Corporation Performing a configuration virtual topology change and instruction therefore
US20090182992A1 (en) * 2008-01-11 2009-07-16 International Business Machines Corporation Load Relative and Store Relative Facility and Instructions Therefore
US20090182984A1 (en) * 2008-01-11 2009-07-16 International Business Machines Corporation Execute Relative Long Facility and Instructions Therefore
US20090182988A1 (en) * 2008-01-11 2009-07-16 International Business Machines Corporation Compare Relative Long Facility and Instructions Therefore
US7734900B2 (en) 2008-01-11 2010-06-08 International Business Machines Corporation Computer configuration virtual topology discovery and instruction therefore
US20090182985A1 (en) * 2008-01-11 2009-07-16 International Business Machines Corporation Move Facility and Instructions Therefore
TWI397060B (zh) * 2008-11-25 2013-05-21 Ind Tech Res Inst 物件導向儲存裝置之磁碟配置方法
US8804764B2 (en) 2010-12-21 2014-08-12 International Business Machines Corporation Data path for data extraction from streaming data
US8861386B2 (en) * 2011-01-18 2014-10-14 Apple Inc. Write traffic shaper circuits
US8744602B2 (en) 2011-01-18 2014-06-03 Apple Inc. Fabric limiter circuits
US20120198213A1 (en) * 2011-01-31 2012-08-02 International Business Machines Corporation Packet handler including plurality of parallel action machines
US9231892B2 (en) * 2012-07-09 2016-01-05 Vmware, Inc. Distributed virtual switch configuration and state management
US10481933B2 (en) 2014-08-22 2019-11-19 Nicira, Inc. Enabling virtual machines access to switches configured by different management entities
US10241706B2 (en) * 2016-05-20 2019-03-26 Renesas Electronics Corporation Semiconductor device and its memory access control method
US10735541B2 (en) 2018-11-30 2020-08-04 Vmware, Inc. Distributed inline proxy
CN111181738B (zh) * 2020-01-20 2021-11-23 深圳市普威技术有限公司 一种poe供电设备和系统
US11182160B1 (en) 2020-11-24 2021-11-23 Nxp Usa, Inc. Generating source and destination addresses for repeated accelerator instruction
CN119906696A (zh) * 2021-07-06 2025-04-29 华为技术有限公司 分配地址的方法、确定节点的方法、装置及存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4168469A (en) * 1977-10-04 1979-09-18 Ncr Corporation Digital data communication adapter
US4247892A (en) * 1978-10-12 1981-01-27 Lawrence Patrick N Arrays of machines such as computers
JPS56164464A (en) * 1980-05-21 1981-12-17 Tatsuo Nogi Parallel processing computer
US4533993A (en) * 1981-08-18 1985-08-06 National Research Development Corp. Multiple processing cell digital data processor
US4543642A (en) * 1982-01-26 1985-09-24 Hughes Aircraft Company Data Exchange Subsystem for use in a modular array processor
US4493048A (en) * 1982-02-26 1985-01-08 Carnegie-Mellon University Systolic array apparatuses for matrix computations
US4553203A (en) * 1982-09-28 1985-11-12 Trw Inc. Easily schedulable horizontal computer
US4523273A (en) * 1982-12-23 1985-06-11 Purdue Research Foundation Extra stage cube
US4644496A (en) * 1983-01-11 1987-02-17 Iowa State University Research Foundation, Inc. Apparatus, methods, and systems for computer information transfer
US4598400A (en) * 1983-05-31 1986-07-01 Thinking Machines Corporation Method and apparatus for routing message packets
US4814973A (en) * 1983-05-31 1989-03-21 Hillis W Daniel Parallel processor
US4805091A (en) * 1985-06-04 1989-02-14 Thinking Machines Corporation Method and apparatus for interconnecting processors in a hyper-dimensional array

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0206580A3 (de) * 1985-06-04 1988-05-18 Thinking Machines Corporation Verfahren Und Vorrichtung zum Verbinden von Prozessoren in einem hyperdimensionalen Netz
GB2201817A (en) * 1987-02-27 1988-09-07 Massachusetts Inst Technology Geometry-defining processors
EP0358704A4 (en) * 1987-04-27 1990-12-19 Thinking Machines Corporation Method and apparatus for simulating m-dimension connection networks in an n-dimension network where m is less than n
GB2206428A (en) * 1987-06-15 1989-01-05 Texas Instruments Ltd Computer
EP0386151A4 (en) * 1987-11-10 1992-08-05 Echelon Systems Multiprocessor intelligent cell for a network which provides sensing, bidirectional communications and control
FR2626091A1 (fr) * 1988-01-15 1989-07-21 Thomson Csf Calculateur de grande puissance et dispositif de calcul comportant une pluralite de calculateurs
EP0325504A1 (de) * 1988-01-15 1989-07-26 Thomson-Csf Hochleistungsrechner mit mehreren Rechnern
US10069674B2 (en) 2013-12-12 2018-09-04 International Business Machines Corporation Monitoring file system operations between a client computer and a file server
US10075326B2 (en) 2013-12-12 2018-09-11 International Business Machines Corporation Monitoring file system operations between a client computer and a file server

Also Published As

Publication number Publication date
EP0201797A3 (en) 1988-06-01
EP0201797B1 (de) 1993-02-17
US5113523A (en) 1992-05-12
DE3687764T2 (de) 1993-10-07
DE3687764D1 (de) 1993-03-25

Similar Documents

Publication Publication Date Title
EP0201797B1 (de) Rechnersystem mit hoher Leistung
US5121498A (en) Translator for translating source code for selective unrolling of loops in the source code
Kuck et al. The Burroughs scientific processor (BSP)
US5038282A (en) Synchronous processor with simultaneous instruction processing and data transfer
US8489858B2 (en) Methods and apparatus for scalable array processor interrupt detection and response
US6219775B1 (en) Massively parallel computer including auxiliary vector processor
US4016545A (en) Plural memory controller apparatus
US4031517A (en) Emulation of target system interrupts through the use of counters
EP0490524A2 (de) Pipeline-Verfahren und -Gerät
EP0124402A2 (de) Mikroprozessor
JPS5911943B2 (ja) デ−タ処理装置の為のトラツプ機構
Bodenstab et al. The UNIX system: UNIX operating system porting experiences
KR100694212B1 (ko) 다중-프로세서 구조에서 데이터 처리 수행성능을증가시키기 위한 분산 운영 시스템 및 그 방법
Heath Microprocessor architectures and systems: RISC, CISC and DSP
EP0107263A2 (de) Mensch-Maschine Schnittstelle
WO1980000758A1 (en) Modular programmable signal processor
JP3074386B2 (ja) 並列プロセツサ
Simar et al. A 40 MFLOPS digital signal processor: The first supercomputer on a chip
Mattson et al. Imagine programming system user’s guide
Palmer A VLSI parallel supercomputer
US7137109B2 (en) System and method for managing access to a controlled space in a simulator environment
Jensen et al. The Honeywell Modular Microprogram Machine: M3
Loughlin NOSC Advanced Systolic Array Processor (ASAP)
Otten et al. Implementation of KRoC on Analog Devices’" SHARC" DSP
Tirpak Jr Software Development on the High-Speed Systolic Array Processor (HISSAP): Lessons Learned.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE FR GB IT

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE FR GB IT

17P Request for examination filed

Effective date: 19881115

17Q First examination report despatched

Effective date: 19910125

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB IT

REF Corresponds to:

Ref document number: 3687764

Country of ref document: DE

Date of ref document: 19930325

ITF It: translation for a ep patent filed
ITTA It: last paid annual fee
ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20050408

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20050421

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20050427

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20050430

Year of fee payment: 20

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20060428

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20