EP0244480A1 - Systeme integre de traitement de donnees a ordinateurs multiples - Google Patents

Systeme integre de traitement de donnees a ordinateurs multiples

Info

Publication number
EP0244480A1
EP0244480A1 EP19870900365 EP87900365A EP0244480A1 EP 0244480 A1 EP0244480 A1 EP 0244480A1 EP 19870900365 EP19870900365 EP 19870900365 EP 87900365 A EP87900365 A EP 87900365A EP 0244480 A1 EP0244480 A1 EP 0244480A1
Authority
EP
European Patent Office
Prior art keywords
processor
data
memory
user
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19870900365
Other languages
German (de)
English (en)
Inventor
Glen J. Culler
Robert B. Pearson
Michael Mccammon
William L. Proctor
John L. Richardson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CULLER SCIENTIFIC SYSTEMS Corp
Original Assignee
CULLER SCIENTIFIC SYSTEMS Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CULLER SCIENTIFIC SYSTEMS Corp filed Critical CULLER SCIENTIFIC SYSTEMS Corp
Publication of EP0244480A1 publication Critical patent/EP0244480A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
    • G06F9/3879Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor

Definitions

  • This invention relates generally to a data processing system having a plurality of user processors for performing parallel numerical operations assigned to the user processors and more specifically to an integrated, multicomputer data processing system having a user processor section wherein the user processor includes an organizational processor and math processor capable of operating in parallel for performing numerical operations by executing specific portions of an instruction stream in parallel.
  • FIG. 1 discloses a multicomputer system which increases processing speed of a computer by use of a plurality of independent computers operating in parallel.
  • the multicomputer system of FIG. 1 is capable of handling multiple tasks by use of multiple processor/memory combinations which perform parallel task execution.
  • Such a multicomputer system is generally referred to as a "distributed" system because the system operates on the principle that separate tasks, illustrated by rectangles 30 in FIG. 1, are distributed to separate computing systems through memories, illustrated by rectangles 34, which are operatively coupled to a plurality of parallel CPU's designated by rectangles 36.
  • the performance limits of each of the individual CPU's 36 which typically are individual microcomputers, operate in the 1.5 millions of instructions per second ("MIPS") to 2 MIPS of integer performance, but which have limited floating point performance levels.
  • MIPS instructions per second
  • efficient multicomputer systems are required which maintain close coupling mechanisms.
  • the multicomputer system of FIG. 1 requires that the operating system design be inherently capable of managing multiple task processors efficiently.
  • the multicomputer systems illustrated in FIG. 1 are utilized in low-cost applications having a large volume of low-performance tasks to be performed such that the tasks can be distributed among several microcomputers.
  • such low-performance tasks include business and personal computer applications.
  • Multicomputer systems of the type illustrated in FIG. 1 are limited with respect to high-performance and high math applications such as, for example, bound scientific applications including simulations, signal processing, image processing, and the like.
  • FIG. 2 Another known approach to increase the processing speed by use of a coprocessor system which is illustrated in FIG. 2 , captioned "Prior Art.”
  • the coprocessor of FIG. 2 is capable of handling single tasks by use of pre/post processors, based upon parallel, multi-CPU instruction execution.
  • FIG. 3 Another known approach for increasing processing speed of large-scale digital computer is generally referred to as a pipelined computer system such as that illustrated in FIG. 3 labeled "Prior Art.”
  • the computer system relies on certain known parameters such as, for example, that certain types of computer operations are deterministic, that is, predictable to a high degree.
  • the cache is a major contributor to the speed, since it acts as a super high-speed "scratchpad" memory for the computer, that is as a means for the parallel prefetch hardware which races ahead to fetch data from memory, to store data, as well as a place for the processor to quickly find the data, when it is time to start the next instruction.
  • a cache memory may be cycling at 50 nanoseconds, compared to regular memory which may be cycling at 400 nanoseconds, such that considerable time can be saved.
  • branch instruction or subroutine is required, the memory cache must be emptied and reloaded with instructions starting at the new branch location.
  • This invention relates to a new, novel and unique modular multicomputer system which utilizes a computer architecture which is based upon a modular multicomputer design in which a plurality of integrated user processors operate in parallel from instructions which form part of an instruction stream under scheduling provided by an operating system processor and a system controller within the integrated, multicomputer data processing system.
  • the integrated, multicomputer data processing system of the present invention has a control section which includes means for a system input and output and user processor sections having a math processor and an organizational processor which loads and unloads data and instructions enabling the math processor to operate at maximum efficiency.
  • the user processor section includes means responsive to the addressing signals for fetching instructions from a loaded instruction cache in the user processor section and for enabling the math processor to execute programmed arithmetic operations on the loaded data signals in response to command signals and tasks scheduled for the user processor by the operating system processor via the system controller.
  • the computer architecture is based upon the type of operations to be performed by the computer, be they determinative or nondeterminative operations.
  • high-speed, special purpose computers are formed by a combination of scientific computers with special purpose, high-speed array processors to handle the determinative calculations which are generally required for bound applications.
  • the processing problems comprise a blend of integers and array processing requirements. If a large amount of array processing content is contained with the scientific application, supermini computers are used. However, they have slow floating point capability and, as such, are not able to perform the desired array processing high speeds. Thus, the use of an array processor as a special purpose, high-speed digital computer is required in combination with a scientific computer to obtain the desired speeds. This results in a higher price-to-performance ratio. usually, computer systems having the lowest price-to-performance ratio are desired for scientific applications.
  • FIG. 1 is a prior art logic diagram which represents a multicomputer system which is capable of performing multiple tasks by means of a multiple processsor/memory combination and parallel task execution;
  • FIG. 2 is a prior art logic diagram which represents a coprocessor computer system of the prior art which is capable of performing a single task using pre-post processing parallel multiple central processing units for executing the instructions;
  • Fig. 3 is a prior art logic diagram of a prior art pipelined computer system which is capable of performing a single task with multistage execution utilizing a single processing unit and serial execution;
  • Fig. 4 is a diagrammatic representation of an integrated, multicomputer data processing system which includes one user processor section having a math processor and an organizational processor;
  • FIG. 5 is a diagram representing an integrated, multicomputer data processing system having N user processor sections
  • FIG. 6 is a block diagram of a typical user processor showing the various components thereof;
  • FIG. 7 is a diagrammatic representation of a multiple task, multiple processor/memory combination which demonstrates the capability of the integrated, multicomputer data processing system of the present invention interfaced with a plurality of work stations;
  • FIG. 8 is a logic diagram showing the functional relationships between the user processor, system memory bus, and staging address bus for transferring instructions and data in parallel into the user processor;
  • FIG. 9 is a logic diagram illustrating the relationship between the operating system processor, which is operatively coupled to an input/output data transfer means, and a system controller;
  • FIG. 10 is a logic diagram illustrating the relationship between the system controller, the system memory bus, and the system addressing bus;
  • FIG. 11 is a logic diagram illustrating the various components of an XY machine which functions as the math processor and an A machine which functions as the organizational processor, each of which is capable of performing arithmetic operations;
  • FIG. 12 is a block logic diagram illustrating the operation of the A machine in performing arithmetic calculations in the user processor
  • FIG. 13 is a block diagram showing the relationship between the various components of the XY machine, showing specifically the shared arithmetic components between the X machine segment and the Y machine segment thereof
  • FIG. 14(A) and 14(B) are block diagrams illustrating the arithmetic registers for the X input and Y input of the XY machine, respectively;
  • FIG. 15 is a block logic diagram illustrating the X memory address controller
  • FIG. 16 is a block logic diagram illustrating the data path for the XY random access memory
  • FIG. 17 is a block diagram illustrating the operation of the instruction sequencer which controls the decoding of instruction to provide control signals to the XY machine and the A machine;
  • FIG. 18 is a diagrammatic representation of a compiler illustrating a variety of front end language processors applied through a compiler interface to a back end user processor code generator which generates executable application object codes for executing an optimum number of calculations of the mix of operations in the user processor;
  • FIG. 19 is a block diagram illustrating the operating system functions of the integrated, multicomputer data processing system wherein a wide variety of standard input-output devices at the applications layer are interfaced to the system interface layer and the tasks are distributed to a hardware layer and executed in parallel.
  • FIGS. 1, 2 and 3 relate to the prior art computer system architectures and means for executing instructions and have been discussed hereinbefore in the Description of the Prior Art section.
  • FIG. 4 diagrammatically represents the integrated, multicomputer data processing system of the present invention having a global memory section 100, a high-speed input-output section 102 which is operatively connected to a fast disk 104, and a control section shown generally by dashed lines 108.
  • the control section includes an input/output data transfer means 110 for system input and transfer means 110 is adapted to network or interact with a wide variety of state-of-the-art peripheral devices such as, for example, a system disk 114, a terminal 116, magnetic tape 118, a network of work stations 120 and 122 which are operatively connected through a networking system such as ETHERNET 126 to the input/output data transfer means 110, and data communications 128 which likewise are connected to the input/output data transfer means 110.
  • a system disk 114 a terminal 116, magnetic tape 118, a network of work stations 120 and 122 which are operatively connected through a networking system such as ETHERNET 126 to the input/output data transfer means 110, and data communications 128 which likewise are connected to the input/output data transfer means 110.
  • the control section 108 includes a system memory bus 132 and staging bus 136, each of which is capable of transferring 64 bits of data and 32 bits of address.
  • the control section 108 further includes an operating system processor 140 which is operatively connected to the input/output data transfer means 110 for input-output operations by a connecting means illustrated by arrow 142. Also, an operating system processor random access memory 146 is operatively connected to the operating system processor 140 through a connecting means illustrated by arrow 148. Also, the random access memory 146 is operatively connected to the input/output data transfer means through a connecting means, as illustrated by arrow 150.
  • the control section 108 further includes a system controller 154 which operates under control of the operating system processor 140.
  • the system controller 154 is operatively connected to the input/output data transfer means by a communication means which is represented by arrow 158.
  • the operating system processor 140 controls the operation of the system controller 154 by means of execution signals and interrupt signals which are applied from the operating system processor 140 and connecting means 142 through the input/output data transfer means 110 and communication means 158 to the system controller 154.
  • the system controller 154 is operatively coupled via connecting means illustrated by arrow 162 to the system memory bus 132 and is connected via a communication means illustrated by arrow 164 to the staging bus 136.
  • the integrated, multicomputer data processing system is a modular system and includes at least one user processor section illustrated by the dashed box 200 which is operatively connected between the system memory bus 132 and the staging bus 136.
  • the user processor section 200 includes an organizational processor 202 having a random access local memory 210 and a math processing section 218.
  • the random access local memory 210 is operatively connected to the organizational processor 202 through a connecting means evidenced by arrow 212, and a multistage, first-in, first-out (FIFO) buffer register 214.
  • the FIFO buffer register 214 is connected both to the organizational processor 202 and to the math processor section 218 by a connecting means 216.
  • the math processor section 218 is connected to the local memory 210 through a single-stage output register 220 and connecting means 222.
  • the local memory 210 is operatively connected to the system memory bus 132 by a connecting means as evidenced by arrow 218.
  • the organizational processor 202 is connected by a connecting means 226 to the system controller 154.
  • the user processor 200 includes a program memory 228 which is operatively coupled to the organizational processor 202 by a connecting means shown by arrow 230.
  • the program memory (sometimes referred to as the "P Memory") 228 is also operatively connected to the staging bus 136 via a connecting means represented by arrow 232.
  • the staging bus 136 loads the data caches 246 and 248 via connecting means 232 and 206 respectively.
  • the user processor's math processing section 218 includes two sets of dual data cache memories 246 and 248.
  • the math processing section 240 includes a math processor 252 having a plurality of connecting means, shown generally as 260, which interconnects each of the two sets of dual data caches 246 and 248 to the math processor 252 such that data can be transferred therebetween.
  • the data caches 246 of the math processing section 218 and the P Memory 228 are likewise connected to the staging bus 136 by connecting means illustrated by arrow 232.
  • Data transfer from the local memory 210 to the organizational processor 202 occurs over connecting means 232 and an input buffering means such as the FIFO buffering register 214.
  • Data transfer from the math processor section 218 occurs through the output means 220, which in the preferred embodiment is in the form of a single stage, clocked output register which is capable of transferring numeric results from the math processing section 218 to the output register 220 in one clock cycle.
  • the output register 220 transfers the same to the local memory 210 for further processing.
  • the communication between the local memory 210 and the organizational processor 202 is in the form of an asynchronous transfer of data and, in the preferred embodiment, utilizes a three-stage, first-in-first-out buffer register as an input buffering means 214.
  • the math processing 218 is operatively connected to the P Memory 228 by connecting means illustrated by arrow 276.
  • the organizational processor 202 has direct communication to the math processor section 218 over communication means 216 which permits data transfer therebetween without going through a register.
  • the organizational processor functions to keep a number of instructions ready for the math processor such that an instruction can be immediately loaded into the math processor together with the applicable data.
  • the organizational processor loads the instructions and data as the performed arithmetic operation just completed by the math processor is loaded into the local memory.
  • the operating system processor 140 controls operation of the integrated, multicomputer data processing system by means of the input/output data transfer means 110 which interfaces with the input-output devices 114 through 128, and which applies interrupt and execution signals to the system controller 154 to control overall system operation.
  • the system controller in turn, controls the organizational processor 202 via communication means 226.
  • the system controller 154 under control of the operating system processor 140 functions to load data from the system memory bus 132 into the user processor 200.
  • the staging bus 136 loads the user processor 200 with data and instructions concerning the programmed arithmetic operations.
  • the user processor 200 is able to communicate over the system memory bus 132 with the high-speed input-output processor 102 by a connecting means shown generally by arrow 280.
  • the user processor 200 can communicate with the global memory 100 by means of the system memory bus 132 and a connecting means illustrated by arrow 282.
  • the integrated, multicomputer data processing system is modular in design and can have additional user processor sections added thereon to increase both MIPS and MFLOP operating rates of the data processing system.
  • the operating system processor 140 including its random access memory 146, operates to schedule and assign tasks and command signals through the system controller 154 over the system memory bus 132 and the staging bus 136 to each user processor.
  • Information is available to the operating system processor from any of the peripheral devices 114, 116, 118, 120, 122, 126 or 128 by means of the input/output data transfer means 110.
  • the input/output data transfer means also is capable of loading information directly into or receiving information from the random access memory 146 and from the operating system processor 140.
  • the operating system processor 140 via the connecting means 142, input/output data transfer means 110, and connecting means 158 schedules and assigns tasks to each user processor independently through the system controller 154.
  • the operating system processor 140 generates command signals in the form of interrupt signals and execution signals for controlling system operations and also, through the system controller 154, controls operation of the high-speed input-output processor 102, including the fast disk 104 and the global memory 100.
  • FIG. 5 illustrates an integrated, multicomputer data processing system which includes the same components illustrated in FIG. 4 and which are illustrated by the same numbers utilized in FIG. 4, together with the first user processor section 200, a second user processor section 302, and an N user processor 304.
  • the integrated, multicomputer data processing system is capable of operating with up to four user processors based upon the design set forth herein.
  • the operating system processor, system controller, system memory bus, and staging bus could be designed and programmed with appropriate compilers and the like in order to handle more than four user processors in order to increase the MIPS and MFLOP rates of processing data.
  • the operating system processor 140 controls operation of each of the user processors 200, 302 and 304 as illustrated in FIG. 5.
  • FIG. 4 could be duplicated, for example with four user processors in a group, and then arranged with each group in a cluster configuration having a cluster operating system processor to schedule and assign tasks to each group of four user processors, all of which operate independently.
  • FIG. 6 illustrates in greater detail the method for loading both data signals and addressing signals into the user processor.
  • FIG. 6 illustrates that the system memory bus 132 also includes a system address bus shown as 132' and that the staging data bus 136 includes a staging address bus 136'.
  • FIG. 6 illustrates the major components and connecting means with the same numeral designations as those of FIG. 4 and further includes conecting means between the organizational processor 202 and the system address bus 132' by means of connecting means illustrated by arrow 204'.
  • the organizational processor 202 is operatively connected to the staging address bus 136' by connecting means shown as arrow 206'.
  • the local memory 210 is operatively connected by a connecting means shown by arrow 218' to the system address bus 132'.
  • the math processing section 240 is likewise connected to the staging address bus 136' by a connecting means shown as arrow 266'.
  • the transfer of data signals and address signals is performed by means of the system memory bus 132, the system address bus 132', the staging data bus 136 and the staging address bus 136'.
  • the system controller 154 is operatively coupled to the operating system processor 140 for receiving command signals which command the system controller 154 to transfer data signals over the staging data bus 136 and to transfer address signals over the staging address bus 136' to the user processor 200 to preload the user processor for the next operation while the user processor is executing a program.
  • the system controller concurrently transfers data signals over the system memory bus 132 and addressing signals over the system address bus 132'.
  • the operating system processor 140 is generally referred to as a Kernel Processor.
  • the operating system processor comprises a Sun 2 68010 processor/memory board with two megabytes of memory and an attached input/output data transfer means for system I/O.
  • the Kernel Processor runs the Berkeley version 4.2bsd UNIX operating system.
  • the operating system processor 140 desires to initiate a task in another processor, the operating system processor 140 fills out a task control block specifying the task, its size in bytes, its starting address in memory, together with additional information.
  • the operating system processor 140 then commands to the system controller 154 to load the task into the user processor, such as user processor 200, followed by a later order to the user processor 200 to begin executing the task.
  • the system controller 154 is a special block transfer device which loads the user processors 200, 302 and 304 (FIG. 5) with tasks to be executed based upon commands from the operating system processor 140. In essence, the system controller 154 operates directly under control of the operating system processor 140.
  • the staging controller 154 performs two specific functions, namely: (1) transfer of data between a user processor and common data memory; and (2) support of input/output data transfer means input/output devices to and from common data memory.
  • the user processor 200 has a large program cache 228 which has a capacity in the order of 256 KB.
  • the user processor 200 includes duplicated context switching facilities which permits the user processor 200 to execute one task while the system controller 154 is removing the last executed task from the user processor or is loading the next application task into the user processor.
  • the system controller 154 performs the function of saving the information in common data memory and later fetching the same from storage on disk.
  • the system controller 154 functions to keep the user processor 200 operating at maximum efficiency by minimizing waiting time experienced by the user processor 200 waiting to receive a new task or transferring data during the various staging processes.
  • the system controller 154 functions to enable the user processor 200, or any other user processors operatively connected to the system, such as user processors 302 and 304 in FIG. 5, to operate independently at maximum efficiency.
  • the user processor section 200 includes a random access memory 210 and a buffering means 214, in the form of a first-in-first-out buffer register, for enabling programmed, asynchronous transfer of numerical results between organizational processor 202 of the user processor section 200 and the random access memory means 210.
  • the local memory applies the information via connecting means 220 to the first-in- first-out buffer register 214 and receives information from the output register 220.
  • the organizational processor 202 includes means which are responsive to an instruction stream read out from the instruction cache 228 in response to instruction addresses.
  • a certain portion of the instruction stream relates to the math processor 252 and the applicable portion of the instruction stream is decoded to produce control signals which are used to control machine segments having shared elements of the math processor to execute selected mathematic operations on the data signals transferred to the user processor section 200.
  • FIG. 7 illustrates diagrammatically the architectural structure of the integrated, multicomputer data processing system of the present invention for performing multiple tasks utilizing multiple processor/memory combinations.
  • the task source may be initiated in a work station computer with the work station task being identified by rectangles 300 and 302 illustrating "n" work stations.
  • the work station task 300 is applied to a memory 304 while work station tasks 302 are applied to "n" number of memories 306.
  • the memory 304 applies instructions from memory to the work station processor 310 while the other "n” memories 306 likewise apply instructions to "n” processors designated at 312.
  • the outputs of the work station processors 310 and 312 are applied via an input/output data transfer means, such as MULTIBUS or ETHERNET for a local area network, as illustrated in FIG.
  • the operating system processor 140 receives a task such as, for example, a task assigned from the work station processor 310, which is designated by rectangle 320.
  • the task is then stored in the operating system computer memory 146 which, in turn, applies the instructions to the operating system processor 140, which ultimately applies the same to a data memory.
  • the operating system processor 140 then applies appropriate command signals together with the appropriate instructions and data via the system controller 154, to the user processor section.
  • the application task being assigned to the user processor is designated by rectangle 342 for a single task and by rectangles 344 for application tasks being assigned to other user processors.
  • the application task 342 assigned to the user processor is stored in the user processor memory, such as memory 210.
  • memories 348 other user processors apply the task to its associated "n" memories designated by memories 348.
  • the memories 210 then apply the instructions to the organizational processor 202 and to the instruction cache which is represented as a processing and instruction cache by rectangle 350 in FIG. 7.
  • the instruction cache included in rectangles 350 and 352 can be loaded from the staging bus.
  • the processing and instruction caches for the other "n" user processors are designated by rectangles 352.
  • An instruction cache and processing section shown generally by rectangle 350, performs the function of receiving an instruction stream and processing the same such that certain portions of the instruction stream relating to a the mathematical processor are decoded to produce a control signal which is applied to the appropriate segment of the math processor 252. Similarly, certain portions of the instruction stream are decoded to produce a control signal which is applied to the organizational processor 202, which is capable of performing simple arithmetic calculations.
  • the math processor 252 and the organizational processor 202 execute the programmed operations on the data and transfer the results therefrom to the local memory 210 through an output register such as the single stage output register 272 in FIG. 1.
  • the organizational processor 202 also ensures that sufficient instructions and data are available for loading into the math processor upon completion of a performed arithmetic operation to keep the math processor operating at maximum efficiency.
  • the user processor as a separate processor application section, can be viewed as a single task, parallel execution computer system.
  • FIG. 8 is a logical block diagram representing the architecture of a user processor 200 illustrated in FIGS. 4 and 6 and of the additional user processors 320 and 304 as illustrated in FIG. 5.
  • the logic diagram is illustration for an integrated, multicomputer data processing system having two identical user processor sections shown as 540.
  • Transfer of information into the integrated, multicomputer data processing system from the outside world is accomplished through peripheral devices which are applied to an input/output data transfer means 500 which, in turn, transfers the information between the input-output devices and the operating system processor 502.
  • the operating system processor 502 communicates over the input/output data transfer means 500 with the system controller shown by rectangle 506.
  • the system controller is operatively connected to a system memory bus 510 and a system address bus 512.
  • the system memory bus is capable of communicating 64 bits of data and 32 bits of address within the data processing system.
  • the system controller 506 is operatively connected to the staging data bus 514 and to a staging address bus 516.
  • the staging data bus transfers data having 64 bits and addresses having 32 bits within the data processing system to specific user processors.
  • the system memory bus 510 and the system address bus 512 are operatively connected to a global memory 522 by means of a system memory bus segment 524 and a system address bus segment 526.
  • Local data memories 548 are likewise connected to the system data bus 510 via bus segment 536 and to the system address bus 512 via bus segment 558.
  • the controller and high-speed interface 528 communicate to the high-speed input-output device over a bus 534.
  • the controller and high-speed interface 528 likewise is operatively connected to the system memory bus 510 via a system memory bus segment 530 and to the system address bus 512 by system address bus segment 532.
  • the controller and high-speed interface 528 is connected to the staging data bus 514 by means of staging data bus segment 518 and to the staging address bus 516 by staging address bus segment 520.
  • the controller and high-speed interface 528 is adapted to apply data to and receive data from each of a system memory bus 510, system address bus 512, staging data bus 514, and staging address bus 516.
  • each user processor 540 includes a math processor, generally referred to as the "XY machine” 544, an organizational processor 546, generally referred to as the A machine, and a local memory 548. Also, each user processor includes an instruction cache or program memory 550.
  • the system memory bus 510 is operatively connected to the XY machine 544, to the A machine 546 and to the local memory 548 by means of a local data bus segment 556.
  • System address bus 512 is operatively connected to the organizational processor 546 and to the local memory 548 by means of a local address bus segment 558.
  • the instruction cache 550 is operatively connected to the stating bus 514 via staging bus segment 560 and to the staging address bus 516 via a staging address bus segment 556.
  • the staging data bus 514 is operatively connected to the XY machine 544 by means of staging data bus segment 562.
  • the XY machine 544 is operatively connected to the staging address bus 516 via staging address bus segment 564.
  • the instruction cache 550 is operatively connected to the XY machine via a bus 570 and to the organizational processor, or the A machine, by bus segments 522.
  • the system controller 506 under control of the operating system processor 502 transfers data into and out of the user processors 540 by means of the system memory bus 510 and the staging address bus 514. The transfer of data can be in parallel, as evidenced by each bus segment over the system data bus to global memory.
  • FIG. 9 is a logic block diagram which illustrates the process utilized by the integrated, multicomputer data processing system for preloading the user processor with data signals and addressing signals to ensure that the user processor is continuously loaded with assigned tasks in the form of programs and data and for transferring executed arithmetic operation results from the user processor P Memory and then to the input/output data transfer means and I/O devices.
  • the operating system processor 502 is operatively connected to the input/output data transfer means 500 to transfer information between the input-output devices and the operating system processor 502.
  • the operating system processor 502 is operatively connected to the system controller 506 and applies program signals, interrupt signals, and execution signals to the system controller 506.
  • the operating system processor 502 desires to initiate a task in the form of a programmed arithmetic operation in a user processor, the operating system processor 502 completes a task control block specifying the task or arithmetic operation to be performed, the size of the task in bytes, and the starting address in memory, together with additional control information.
  • the operating system processor 502 When the task information has been assembled by the operating system processor 502, the operating system processor 502 generates an interrupt signal which is applied to the system controller 506 by the input/output data transfer means 500.
  • the system controller 506 receives the task control block of information from the operating system processor 502 and loads the information contained within the task control block such that the data and instruction addresses required for the system memories is loaded over the system memory bus 132 to the user processor and the staging information in terms of the programmed arithmetic operation, the data signals and addressing signals therefor are loaded into the user processor over the staging bus 136.
  • the operating system processor 502, together with the system controller 506, is capable of controlling up to four user procesors as illustrated in FIG. 5.
  • the system memory bus 132 is capable of transferring data between up to eight banks of memory wherein each memory comprises up to 32 megabytes and the data can be transferred at a rate of 56 megabytes per second.
  • each user processor has a local memory 210 which is an associated local memory for that specific user processor and the system memory bus 132 has system access to all of the memory banks including the local memory banks.
  • FIG. 10 is a logical block diagram which illustrates the logic scheme wherein a math processor 546 can gain access to its local memory 548 without the necessity of utilizing- the system memory bus 136.
  • This permits the operating system processor to determine, on a priority scheme, whether the system memory bus 136 has a higher priority data transfer request and, if so, the user processor 546 requiring data can access the same from the local memory 548 over the local data bus segment 556. However, if the user processor 546 requires data located in another memory, such as memory 548, that information can only be accessed over the system memory bus 136. If the user processor 546 requires data signals and addressing signals which are to be loaded therein from the staging bus, the staging bus 132 must directly load the user processor 546 with the required information through the system controller 506 under control of the operating system processor 502.
  • each of the user processors 540 includes an XY machine which functions as the math processor.
  • the XY machine includes memories which are capable of processing data at the rate of 56 MB per second.
  • the X machine and Y machine include staging memories which are able to transfer data at the rate of 56 MB per second.
  • Each user processor includes instruction and data page tables, program memory organized at 128 pages of 2 KB each capable of transferring data at the rate of 112 MB per second, and data memories organized into 8 KB pages of memory wherein the data memories are capable of transferring data at the rate of 56 MB per second.
  • FIG. 11 is a logical block diagram illustrating the elements of the XY machine which functions as the math processor and of the A machine which functions as the organizational processor.
  • the A machine includes a program memory 600 which provides an instruction stream selected by means of a program sequencer 604 which is responsive to addressing signals.
  • the output of the program sequencer 604 is applied to an address translation unit 606 which provides the specific location in the program memory 600.
  • the information contained in the program memory 600 at the location address derived by the address translation unit 606 is read out as an instruction stream into the instruction buffer 608.
  • Instruction buffer 608 applies the specific instruction, in the form of a 96-bit length word, into an instruction extraction means 610.
  • the instruction extraction means 610 produces two instructions as an output, an XY instruction as shown by arrow 612, and an A instruction as shown by arrow 614.
  • the A machine instruction 614 is applied to an A decoder 616 which produces a control signal as an output, as illustrated by arrow 620.
  • the output is applied to the remaining portion of the A machine, or organizational processor, which is capable of responding to the control signal to execute basic arithmetic functions such as shifts, logic operations (AND, OR, NOT, XOR, etc.), addition, subtraction, multiplication, and control functions (e.g., branch).
  • the A machine includes a register file 630 and a set of special registers 632 to provide the data upon which the arithmetic operation is to be performed, as designated by the A control 620.
  • the output of register file 630 is applied to a left shift unit 634, to a multiplication unit 636, or to an ALU (adder) unit 638.
  • the output of the ALU 638 is returned to one of the A machine special registers.
  • the arithmetic results are applied to either of the XY buses, XB or YB buses so that synchronous transfer of data can be obtained between the organizational processor A machine and the XY machine.
  • the operation of the A machine is under clocked control such that each arithmetic operation is performed in one clock cycle.
  • the XY instruction 612 is applied to a decode random access memory 650. If the XY instructions appearing on XY instruction input 612 contain a branch or subroutine component, that requirement is determined by a microsequencer 652 which, in turn, enables a microcode random access memory 654 to produce a microcoded instruction signal which is applied to the XY control 656. In any case, the decode random access memory 650 applies the decoded instruction during the first clock cycle to the XY control 656.
  • the XY machine has two major segments with shared elements.
  • the X machine segment includes a memory 660 which is responsive to an address unit 662 wherein data from registers 664 and 666 are loaded by the address unit 662 into the X memory 660.
  • the X machine segment contains two registers 670 and 672 which are adapted to store data read from the X memory 660.
  • the X machine segment includes a simple or basic arithmetic processor shown generally as 680 which is capable of performing IEEE/INTEGER, ALU and conversion operations, the output of which is applied to output registers 674 and 676. The output of the registers 674 and 676 is applied to the output buffer register wherein the results of the arithmetic operations are transferred directly to the organizational processor.
  • the Y machine segment includes a Y memory 684 which is controlled by Y address unit 686 which is capable of loading data from registers 690 and 692 into the Y memory. Information is read from the Y memory and is stored in registers 694 and 696.
  • the Y machine segment includes an arithmetic processor 700 which is capable of performing IEEE/INTEGER, multiplier, elementary functions, square root, divide, sine, cosine, A-tangent, exponential, logarhythmic, and the like. The results of the arithmetic processor 700 are stored in registers 702 and 704.
  • Registers 670, 672, 694 and 696 are capable of having information from either the X machine segment or the Y machine segment gated therein under controlled gating transfers.
  • information stored in registers 674, 676, 702 and 704 can be gated therein from either the X machine segment or the Y machine segment, all as shown by the letter designations on the registers.
  • a switch 710 is located intermediate the XY machine and functions to control the gating of information into the various registers, as described above, or to control the gating of information into either the processor 680 of the X machine segment or the processor 700 of the Y machine segment.
  • FIG. 12 illustrates the flexibility of the A machine in terms of gating of instructions and data among the various registers, arithmetic units, and buses which are interconnected with the XY machine.
  • the instruction address unit 610, the A register file 630, the A special registers 632, the shift left unit 634, the multiplication unit 636, and the ALU 638 are numbered with the same numeral designations as in FIG. 11.
  • the A machine is interconnected by a plurality of buses such as, for example, AB bus 720, ALU bus 724, AR bus 744, X bus 732, Y bus 734, D data bus 740, and XY CB bus 742.
  • buses comprising the X bus 732 and Y bus 734 are used for bidirectional data, while the D data bus 740 and CB bus 742 are used for data interface.
  • a special registers 632 which are gated onto the AB bus 720 through a gating means 722 and into the ALU 638 A input.
  • the A register 630 B output is gated into the shift left unit 634, the output of which is gated into the ALU 638, B side.
  • the ALU 638 produces an output which is applied to the AR bus 744 or to one of the A special registers 632.
  • FIG. 13 illustrates the X machine segment and the Y machine segment, which share common elements, by means of a block diagram including all the various buses and gating means therein.
  • the components such as, for example, the Y memory 684 and the X memory 660, the X address control 662, the Y address control 686, the various registers 664, 694, 696 and 672, the output registers 674, 676, 702 and 704, and the arithmetic unit 638 and multiplication unit 636, are designated with the same symbols as appear in FIG. 11.
  • the logic diagram of FIG. 13 illustrates that the various buses and gating means permit transfer of information between various system components including from the output registers 674, 676, 702 and 704 which are referred to as V, T, U and W registers, respectively.
  • FIGS. 14(A) and 14(B) are logic diagram for the arithmetic registers wherein FIG. 14(A) is for the X input and FIG. 14(B) is for the Y input.
  • FIG. 14(A) illustrates that the registers 664 and 696 are gated at the output thereof such that the information contained therein can be applied to other arithmetic registers 676 and 674 to enable the results of the various executed programmed arithmetic operations to be available for subsequent operations.
  • FIG. 14(B) shows that register 692 and 696 of the Y machine segment can be gated into the registers 702 and 704 of the Y machine segment or, by means of the various buses, can be gated into the registers of the X machine segment.
  • the arithmetic registers of FIGS. 14(A) and 14(B) include transparent or latch registers shown generally as 693 which can transport or latch depending on the program.
  • FIG. 15 illustrates a logical diagram for the X memory address controller showing the various components thereof.
  • FIG. 15 illustrates that the X memory address controller controls storing and reading of data therein.
  • the address controller is capable of expanding the boundaries of stored information within the memory under control of the address controller such that the length of the data word can vary between a maximum and a minimum.
  • the elements which are common with the elements of FIG. 11 are designated by the same numerals.
  • registers 664 and 666 are adapted to act as the registers utilized for input into the X memory address controller.
  • Register 664 under clocked control, directly transfers data into arithmetic unit 670 and the output of the register 664 can likewise be clocked into other buses, such as buses 670 and 672.
  • the input register 666 has its output applied to the same arithmetic unit 670 and to arithmetic unit 672.
  • the arithmetic calculators 670 and 672 calculate the amount of memory required for the specific data word in terms of bit storage.
  • the output of processor 670 is applied as an input to the arithmetic unit 672.
  • the output of the arithmetic unit 672 is applied to a maximum-minimum calculator comprising arithmetic units 678 and 680.
  • Arithmetic unit 678 has as one input which is a minimum memory limit which is determined by the x min circuit 682.
  • the input from the x min circuit 682 is applied to the arithmetic unit 678 to determine whether the required storage is under the minimum.
  • the maximum amount of storage required in the memory is determined by an x max circuit 690, the output of which is applied as one input to the arithmetic unit 680 which determines whether the maximum storage required for the data word is in excess of the maximum set by circuit 690.
  • the amount of storage required in the memory for a data word is determined by arithmetic units 678 and 680.
  • the outputs of units 678 and 680 are applied to a transfer circuit 692.
  • the Y memory controller has an identical YM address controller as illustrated in FIG. 15.
  • FIG. 16 is a logical block diagram of two fast [45NS] 16KB data caches which are used by the compilers as fast program stacks and register files for the XY math processor.
  • the byte-addressable data caches can also be used as fast array memories when required for first data cache 700 and a second data cache 702.
  • the Y memory utilizes two data caches 708 and 710.
  • the data caches can be loaded by the system controller through clocked gating means shown generally as 712 and 714.
  • the inputs to the gating means 712 and 714 can be from a variety of sources such as the X bus, the staging bus, or organizational processor.
  • the Y memory data caches 708 and 710 are gated to receive either data signals or byte-addressable data caches for use as a fast array memory or for byte-addressable data caches.
  • the gating means 712, 714, 716 and 722 are gated to enable the XY machine to then use the loaded data caches as active caches while the inactive caches are then loaded for subsequent use.
  • each of the dual data caches, 700 and 702 for the X machine and 708 and 710 for the Y machine may be used by the compilers as a fast program stack and as register files. In other modes, as selected by the various applications compilers, the byte-addressable data caches can also be used as fast array memories.
  • Each of the dual, 32-bit wide data caches 700 and 702 for the X machine and 708 and 710 for the Y machine can be loaded by the system controller and are duplicated to permit fast context switching through the switching means 712, 714, 720 and 722.
  • the data caches 700, 702, 708 and 710 can also be loaded by the math processor on a single location load or store basis (random access).
  • the input data transfer mechanism between the data memory is a special 64-bit wide, 3-register deep, first-in-first-out buffer to allow for relative data movement flexibility between the math processor and the memory system.
  • Such an arrangement permits the organizational processor to move ahead of the arithmetic operation being executed by the user processor and the math processor thereof to acquire the next three data items, and while the math processor is utilizing these three data items, the organizational processor can be executing sequence control instructions and specific integer operations as required to keep the user processor operating at maximum efficiency.
  • the output data transfer mechanism between the user processor and data memory is by means of a single stage output buffer. Once the results of the programmed arithmetic operation which has been executed by the math processor has been transferred to the output buffer, the user processor continues with further processing while these results are written to main memory storage.
  • FIG. 17 illustrates the operation of an instruction sequencer which is utilized by the user processor for producing from its program memory the instruction stream required to control the A and XY processors.
  • the address will be stored in register 900 and when clocked the addressing signals are applied to a program transaction table 902 which address a specific location in the program memory. The same information applied to the program transaction table 902 is loaded into an adder 901 for subsequent use if necessary.
  • the program memory 904 contains instruction streams comprising 4-32 bit instructions which are stored in a 256 KB program memory accessed 16 bytes at a time.
  • the addressing signals cause a full 128-bit word instruction stream to be read out of the program memory 904, as illustrated by the shaded portion 906 in memory 904.
  • the instruction stream represented by shaded portion 906 is transferred via transmission means 910 into an instruction buffer 912.
  • the instruction buffer is required because each of the transfers of instruction stream is in response to a single clock cycle. Thereafter, and in response to the next clock cycle, the instruction buffer transfers the information it has therein via transmission means 914 into rotating network 916 which preprocesses the instruction stream to place it into a certain format to determine whether any type of arithmetic operation is to be performed and, if so, which arithmetic processor, namely the XY machine or the A machine, is to perform the operation.
  • the first bit of the instruction stream in the rotating network 916 is sensed to determine whether an arithmetic process is required and, if not, that information is passed via connecting means 920 to an adder 923 which stores the same in the shift register 922 to indicate that no arithmetic operation is required. If an arithmetic operation is required, the first bit of the instruction stream contained in rotating network 916 is sampled to determine whether the arithmetic operation is required to be performed by the XY machine or the A machine. If the arithmetic calculation is to be executed by the XY machine, the appropriate instruction appears on output 930.
  • the instruction designates that an arithmetic operation is to be performed by the A machine
  • that information is sensed by the X instruction width detecting means 932 which causes the instruction to be transferred via communication means 934 to a left shift register 940 at the same time the information is passed to the XY machine for decoding by the rotating network 916.
  • the XY machine decoding produces an X or Y control signal to cause the X machine segment or the Y machine segment to execute the assigned arithmetic operation.
  • the portion of the instruction which is applicable to the A machine is applied by output 942 to the A machine decoder which decodes that portion of the instruction stream to produce an A control signal which causes the A machine to execute the assigned arithmetic operation.
  • FIG. 18 is a block diagram representing the design of the compiler which is capable of making all of the parallel expansion decisions at compile time in lieu of use of complicated preprocessors and postprocessors dividing and distributing the program code among several coprocessors in parallel.
  • the user processor is relieved of the requirement of dividing and distributing the program code for execution and eliminates the need for additional hardware to accomplish the distribution and execution of the program code.
  • the structure of compilers for the user processor utilized in the integrated, multicomputer data processing system described in connection with FIGS. 4, 6 and 8 comprises several front end language processors such as a C compiler front end 730, a FORTRAN 77 compiler front end 732, a VAX FORTRAN compiler front end 734, and other compiler front ends illustrated by rectangles 736 and 738.
  • the front end compilers are applied to a global optimizer and user processor code generator 746 which functions as a single back-end common code generator and optimizer.
  • the primary languages preferred for use in the user processor of the present invention are C and FORTRAN 77, both of which are compatible with the UNIX standard.
  • the global optimizer and compute engine code generator computes an engine object code, or a user processor object code, represented by arrow 748, which is loaded into a linker/loader system 756.
  • the output of the linker/loader 756 is an executable compute engine object code or an executable user processor object code appearing on lead 760.
  • the output from the linker/loader also appears on a second output 764 which is applied to a screen-oriented symbol debugger 766 for viewing by the programmer.
  • the global optimizer and compute engine code generator 746 uses the following optimizing techniques:
  • the compiler rearranges the normal sequence of program execution in order to make the most efficient use of the user processor.
  • the user processor can simultaneously execute a simple math operation such as an add, a complex math operation such as a multiply, and a fetch/store operation from three memories, two data caches plus main memory. Since the user processor can simultaneously execute a number of programmed arithmetic operations, the optimizer rearranges the code to ideally perform all five operations in each and every cycle, and as many times as possible. Rearranging the code may result in the user processor's calculating parts of a problem out- of a normal sequence.
  • An example wherein the code is rearranged to calculate parts of a problem out of normal sequence is that instead of calculating an entire mathematical formula on each pass of a 1,000 iteration loop, the compiler may arrange to calculate the first two terms of the expression 1000 times, put the results in a data cache table, go back and calculate the next two items 1000 times, and so on.
  • the end result is that the calculations are performed in a highly efficient manner by the user processor scientific processor.
  • the compiler functions to arrange the mix of operations to process the programmed arithmetic operations and to perform the maximum number of operations at the same time in a single clock cycle.
  • the compiler optimizes the arrangement such that the mix of operations does not result in the user processor's becoming idle during execution of programmed arithmetic operations or calculations as the result of a poor mix of coded, programmed arithmetic operations.
  • the optimization operations take place in the global optimizer and compute engine code generator 746, which is a second stage of the compile process, which enables the programmer to utilize any one of a number of front end languages as illustrated in FIG. 18.
  • the integrated, multicomputer data processing system of the present invention utilizes a modified version of the Berkeley 4.2bsd UNIX software in the preferred embodiment.
  • FIG.19 illustrates how the modified Berkeley 4.2bsd UNIX software, which was originally designed to operate in a monolithic computer environment, has been converted to a multicomputer, time-sharing system which is capable of handling a plurality of users to perform scientific and engineering compute intensive virtual applications involving simulation, signal processing, image processing, and other complex math processing.
  • the Berkeley 4.2bsd UNIX software includes support for a C shell Command Interpreter 802, UNIX utilities 804, a wide range of scientific applications as represented by rectangles 806, and is capable of use with the Lan and Datacomm network requests represented by rectangles 814.
  • the C shell Command Interpreter 802, UNIX utilities 804, applications 806 and Lan and Datacomm network requests 814 comprise the application layer of the operating system function.
  • a user by means of a terminal 808, is able to interface with user interface 810 to either the C shell Command Interpreter 802, the UNIX utilities 804, or applications 806.
  • the user can utilize an intelligent work station 816 which, through a user interface 822, can be operatively connected to the Lan and Datacomm network requests 814 in order to perform the desired operating system function.
  • Each of the above described C shell Command Interpreter 802, UNIX utilities 804, applications 806 and Lan and Datacomm network requests 814 are applied via communication means represented by arrows 830 into the system call interface layer 832 which includes a UNIX system kernel or Kernel Processor 836.
  • the system call interface 832 provides the capability of distributing software tasks among input-output peripherals 846, user processors 848, and into an additional multibus & ETHERNET systems 852 which represents the hardware level for performing the distributed, parallel processing tasks.
  • the operating system function represented by FIG. 19 is extremely versatile and modular, allowing several user processors and work stations to be attached to the integrated, multicomputer data processing system at the same time and structured such that the users perceive a standard monolithic 4.2bsd UNIX environment for developing and executing their software.
  • the system call interface layer 832 separates the monolithic image of the application layer from the hardware layer where the tasks are distributed and executed in parallel.
  • COMMON/spaces/ n declare the constant n at location spaces COMMON/ spacer/ q,r,t declare constants q,r, & t, starting at location spacer (4 bytes each)
  • the user processor can process the above using the power of its integrated processor architecture to expand the code out in parallel and execute it through three specialized high speed processors.
  • the FORTRAN compiler for the user processor builds special 96-bit wide machine instructions that simultaneously direct the operations of three processors and several memories, namely, the local, global, instruction cache, data caches, and the like.
  • the machine instructions are formed into two parallel streams: (1) an instruction for the organizational processor; and (2) an instruction for the scientific processor.
  • Each instruction has either an organizational processor part or a math processor part, or both.
  • the FORTRAN example is compiled below as an illustration: XY INSTRUCTIONS A INSTRUCTIONS EXPLANATION
  • Moyf1 FIFO, M X PDF spacel +4000[R1 ] Multiply (z+10)+k (next item at bottom of FIFO) times M X (r); calculate address & fetch y (goes into top of FIFO) Move MR,F2 ;FIFO,M X ; Rdf spacer Move the results of the previous multiply (MR) ([(z+10)+k]X(r]) to V, move the bottcm of the FIFO (t) to M X ; and fetch constant q
  • the user processor sequence assembly program has 13 instructions.
  • the number of assembly instructions is not relevant to the processing time in that the real measurement of processing time is in the number of machine cycles used and the cycle times of the machines required to complete the calculation.
  • four control fields are assumed in the instructions identified as: A, B, C, and D. The four control fields are defined as follows:
  • Field A controls the simple and complex math processors and associated registers
  • Field B controls data movements from/to data caches and the FIFO connection between math processors and the organizational processor
  • Field C controls address calculations and registers for the organizational processor, as well as data movements to the FIFO connection;
  • Field D controls reads/writes from/to local or global memory, along with branching operations.
  • the user processor executes the dual instruction stream in parallel using the four control fields as follows: CYC FIELD A FIELD B FIELD C FIELD D FIFO
  • the user processor assembly program consisted of 13 instructions. However, the user processor required only 13 cycles to execute the 13 instructions because of the overlapping effect of the organizational processor, simple math processor and complex math processor.
  • the basic machine cycle time of the user processor is 143 nanoseconds. In the processor, one pass through the loop takes 1859 nanoseconds, providing an equivalent speed of 538,000 passes per second, or 6.99 MIPS and 2.69 MFLOPS.
  • the integrated, multicomputer data processing system is well designed for higher mathematics applications like simulation, signal processing, image processing, and the like.
  • the scientific user or an intelligent compiler must determine how to convert mathematical constructs like Fast Fourier Transforms (FFT), convolutions, function evaluations, and numerical integrations into a complex series of add/subtract/shift/multiply/divide instructions, with other operations like trig functions, square roots, and the like, processed by elaborate floating point subroutines.
  • FFT Fast Fourier Transforms
  • convolutions convolutions
  • function evaluations function evaluations
  • numerical integrations into a complex series of add/subtract/shift/multiply/divide instructions, with other operations like trig functions, square roots, and the like, processed by elaborate floating point subroutines.
  • each pass would require four multiply operations and six additions (comprising adding the two numbers in each of two sets of parentheses, plus the cumulative summing of x' and y' results at the end of each pass).
  • the user processor has a table of single precision floating point trigonometric values in ROM, each 4K entries long, A request for a sine or cosine value is a simple fetch from fast cache which can be performed without any waiting time for a cycle.
  • the user processor performs simultaneous adds and multiplies via the simple math processor and complex math processor. Thus, the six adds and four multiplies only take six cycles, for a total of approximately 14 cycles per pass.
  • the integrated, multicomputer data processing system architecture, operating system and compiler operations are designed such that there is a mixture of vector and nonvector operations to be processed.
  • the user processor actually operates as a high-speed processor due to the compiler's ability to rearrange the processing task to capitalize on more parallel opportunities in the function being calculated without any degradation in performance due to the mix of operations.

Abstract

Un système intégré de traitement de données à ordinateurs multiples comprend une unité de commande (108) ayant une entrée et sortie du système (102) et au moins une unité de processeur utilisateur (200) ayant un processeur mathématique (252). L'unité de commande comprend un processeur du système d'exploitation (140) pour planifier et attribuer des tâches de traitement et pour commander le transfert de données et d'adresses de l'instruction à chaque unité de processeur utilisateur (200). Une commande du système (154) opérationnellement connectée au processeur du système d'exploitation (140) reçoit les signaux de commande et les tâches attribuées et les applique à une unité de processeur utilisateur (200). Le système de traitement de données comprend en outre un bus de mémoire du système (132) pour le transfert des données et un bus de transfert (136) pour des instructions et le transfert bidirectionnel de données, connectés à la commande du système (154) et à chaque unité de processeur utilisateur (200).
EP19870900365 1985-10-24 1986-10-24 Systeme integre de traitement de donnees a ordinateurs multiples Withdrawn EP0244480A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79087385A 1985-10-24 1985-10-24
US790873 1985-10-24

Publications (1)

Publication Number Publication Date
EP0244480A1 true EP0244480A1 (fr) 1987-11-11

Family

ID=25151986

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19870900365 Withdrawn EP0244480A1 (fr) 1985-10-24 1986-10-24 Systeme integre de traitement de donnees a ordinateurs multiples

Country Status (3)

Country Link
EP (1) EP0244480A1 (fr)
JP (1) JPS63501904A (fr)
WO (1) WO1987002800A1 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2217060A (en) * 1988-03-23 1989-10-18 Benchmark Technologies Multi-processor system
GB2215878A (en) * 1988-03-23 1989-09-27 Benchmark Technologies Chip-independant numeric subsystem
GB2215877A (en) * 1988-03-23 1989-09-27 Benchmark Technologies Data processing system
JPH0219965A (ja) * 1988-07-08 1990-01-23 Hitachi Ltd データ管理方法とそのシステム
GB2244828B (en) * 1989-04-24 1993-09-01 Yokogawa Electric Corp Programmable controller
JP2834837B2 (ja) * 1990-03-30 1998-12-14 松下電工株式会社 プログラマブルコントローラ
DE4027324C2 (de) * 1990-08-29 1994-07-14 Siemens Ag Verfahren zum Betrieb eines Coprozessors in einem verteilten Rechnersystem
GB9117148D0 (en) * 1991-08-08 1991-09-25 Skimming William G 3-dimensional modular processing engine
DE4229710B4 (de) * 1991-09-09 2008-06-05 Samsung Electronics Co., Ltd. Digitales Audiodatenspeicherungssystem und damit ausgerüstetes digitales Audio-System
JP4236936B2 (ja) 2001-04-26 2009-03-11 ザ・ボーイング・カンパニー ネットワークバスを介して少なくとも1つのネットワークデバイスと通信するためのシステム及び方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5437633B2 (fr) * 1973-02-12 1979-11-16
JPS5144551B2 (fr) * 1973-03-28 1976-11-29
US4123794A (en) * 1974-02-15 1978-10-31 Tokyo Shibaura Electric Co., Limited Multi-computer system
JPS5162838A (ja) * 1974-11-29 1976-05-31 Toray Industries Nannenseijushisoseibutsuno seizohoho
DE2546202A1 (de) * 1975-10-15 1977-04-28 Siemens Ag Rechnersystem aus mehreren miteinander verbundenen und zusammenwirkenden einzelrechnern und verfahren zum betrieb des rechnersystems
DE2641741C2 (de) * 1976-09-16 1986-01-16 Siemens AG, 1000 Berlin und 8000 München Rechenanlage aus mehreren miteinander über ein Sammelleitungssystem verbundenen und zusammenwirkenden Einzelrechnern und einem Steuerrechner
US4101960A (en) * 1977-03-29 1978-07-18 Burroughs Corporation Scientific processor
JPS55110153A (en) * 1979-02-16 1980-08-25 Mitsubishi Chem Ind Ltd Resin composition
US4302818A (en) * 1979-07-10 1981-11-24 Texas Instruments Incorporated Micro-vector processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO8702800A1 *

Also Published As

Publication number Publication date
JPS63501904A (ja) 1988-07-28
WO1987002800A1 (fr) 1987-05-07

Similar Documents

Publication Publication Date Title
Goodman et al. PIPE: a VLSI decoupled architecture
US5175863A (en) Signal data processing system having independently, simultaneously operable alu and macu
Smith Dynamic instruction scheduling and the Astronautics ZS-1
Pang A simulation study of decoupled architecture computers
CA1119731A (fr) Processeur multibus pour accroitre la vitesse d'execution en utilisant l'effet de pipeline
EP0260409B1 (fr) Système de traitement de données à deux unités d'exécution
Annaratone et al. Warp architecture and implementation
US5983336A (en) Method and apparatus for packing and unpacking wide instruction word using pointers and masks to shift word syllables to designated execution units groups
US3771141A (en) Data processor with parallel operations per instruction
Benitez et al. Code generation for streaming: An access/execute mechanism
US5121502A (en) System for selectively communicating instructions from memory locations simultaneously or from the same memory locations sequentially to plurality of processing
US5083267A (en) Horizontal computer having register multiconnect for execution of an instruction loop with recurrance
US4837678A (en) Instruction sequencer for parallel operation of functional units
US5276819A (en) Horizontal computer having register multiconnect for operand address generation during execution of iterations of a loop of program code
US5036454A (en) Horizontal computer having register multiconnect for execution of a loop with overlapped code
Gimarc et al. A survey of RISC processors and computers of the Mid-1980s
US5226128A (en) Horizontal computer having register multiconnect for execution of a loop with a branch
EP0244480A1 (fr) Systeme integre de traitement de donnees a ordinateurs multiples
US20030221086A1 (en) Configurable stream processor apparatus and methods
US5390306A (en) Pipeline processing system and microprocessor using the system
KR100267092B1 (ko) 멀티미디어신호프로세서의단일명령다중데이터처리
KAMIN III et al. Fast Fourier transform algorithm design and tradeoffs on the CM-2
EP0230383A2 (fr) Exécution séquentielle d'opérations arithmétiques sur des collections de données
Smith et al. The astronautics ZS-1 processor
JP2551163B2 (ja) 命令処理制御方式

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LI LU NL SE

17P Request for examination filed

Effective date: 19871106

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19890502

RIN1 Information on inventor provided before grant (corrected)

Inventor name: RICHARDSON, JOHN, L.

Inventor name: MCCAMMON, MICHAEL

Inventor name: PEARSON, ROBERT, B.

Inventor name: PROCTOR, WILLIAM, L.

Inventor name: CULLER, GLEN, J.