CA2515283A1 - Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem - Google Patents

Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem Download PDF

Info

Publication number
CA2515283A1
CA2515283A1 CA002515283A CA2515283A CA2515283A1 CA 2515283 A1 CA2515283 A1 CA 2515283A1 CA 002515283 A CA002515283 A CA 002515283A CA 2515283 A CA2515283 A CA 2515283A CA 2515283 A1 CA2515283 A1 CA 2515283A1
Authority
CA
Canada
Prior art keywords
processor
memory
execution logic
direct execution
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002515283A
Other languages
French (fr)
Other versions
CA2515283C (en
Inventor
Jon M. Huppenthal
Paul A. Leskar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SRC Computers LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/992,763 external-priority patent/US6076152A/en
Application filed by Individual filed Critical Individual
Publication of CA2515283A1 publication Critical patent/CA2515283A1/en
Application granted granted Critical
Publication of CA2515283C publication Critical patent/CA2515283C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

A multiprocessor computer architecture incorporating a plurality of programmable hardware memory algorithm processors (MAPs) in the memory subsystem. The MAP
may comprise one or more field programmable gate arrays (FPGA), which function to perform identified algorithms in conjunction with, and tightly coupled to a microprocessor and each MAP is globally accessible by all of the system processors for the purpose of executing user definable algorithms. A circuit within the MAP signals when the last operand has completed its flow thereby allowing a given process to be interrupted and thereafter restarted. Through the use of read only memory (ROM), located adjacent to the FPGA, a user program may use a single command to select one of several possible pre-loaded algorithms thereby decreasing system configuration time. A computer system structure MAP may function in normal or direct memory access (DMA) modes of operation and in the later mode, one device may feed results directly to another thereby allowing pipelining or parallelizing execution of the user defined algorithm.
The system also provides a user programmable performance monitoring capability and utilizes parallelizer software to automatically detect parallel regions of the user applications containing algorithms that can be executed in the programmable hardware.

Claims (107)

1. In a computer system having at least one data processor for executing an application program by operating on user data in accordance with application program instructions, said computer system having at least one memory bank with a data bus and an address bus connected to said at least one data processor, the improvement comprising:
a plurality of reconfigurable memory algorithm processors within individually addressable portions of said memory bank, means connecting said plurality of memory algorithm processors to said data bus and to said address bus such that said plurality of memory algorithm processors are individually memory addressable by said at least one data processor as said one data processor executes said application program; and said plurality of memory algorithm processors being configured as individual data processing elements to perform data processing related to said application program in accordance with an identified algorithm, said data processing being performed on at least one operand that is received directly from said at least one data processor.
2. The improvement of claim 1 wherein each of said plurality of memory algorithm processors comprises a field programmable gate array.
3. The improvement of claim 1 wherein each of said plurality of memory algorithm processors is operative to memory address said memory bank independent of said at least one data processor.
4. The improvement of claim 1 wherein an identified algorithm is preprogrammed into each of said plurality of memory algorithm processors.
5. The improvement of claim 4 wherein a plurality of identified algorithms are preprogrammed into a memory device that is associated with said plurality of memory algorithm processors.
6. The improvement of claim 5 wherein said memory device comprises at least one read only memory device.
7. The improvement of claim 1 wherein any given one of said plurality of memory algorithm processors is operative to pass a data processing result of an operand that has been processed by an identified algorithm to another of said plurality of memory algorithm processors.
8. The improvement of claim 1 wherein said plurality of memory algorithm processors comprise a memory algorithm processor assembly, said memory algorithm processor assembly including:
a control block having a command decoder coupled to said address bus and having a pipeline counter coupled to said command decoder;
said command decoder for providing a last operand flag to said pipeline counter in response to a last operand command from an operating system of said at least one data processor.
9. The improvement of claim 8 wherein said control block further includes:
at least one status register; and an equality comparator coupled to receive a pipeline depth signal and an output of said pipeline counter, said equality comparator for providing a pipeline empty flag to said at least one status register.
10. The improvement of claim 9 wherein said at least one status register is coupled to said command decoder to receive a register control signal and is coupled to said plurality of memory algorithm processors to receive a status signal, said at least one status register providing a status word output signal.
11. A multiprocessor computer system comprising:
a plurality of data processors for executing at least one application program by operating on user data in accordance with program instructions;
a memory bank having a data bus and an address bus connected to said plurality of data processors;

a plurality of reconfigurable memory algorithm processors within said memory bank at plurality of individual memory addressable memory locations;
means coupling said plurality of individual memory algorithm processors to said data bus and to said address bus;
said plurality of reconfigurable memory algorithm processors being individually memory addressable by all of said plurality of data processors;
and said plurality of memory algorithm processors being individually configurable to perform an identified algorithm on an operand that is received from a write operation by one of said plurality of data processors to said memory bank as said at least one of said plurality of data processors executes said at least one application program.
12. The multiprocessor computer system of claim 11 wherein all of said plurality of memory algorithm processors are memory addressable by all of said plurality of data processors.
13. The multiprocessor computer system of claim 12 wherein all of said plurality of memory algorithm processors are mutually memory addressable.
14. The multiprocessor computer system of claim 13 wherein said plurality of memory algorithm processors collectively comprises a memory algorithm processor assembly, said memory algorithm processor assembly including:
a control block operative to provide a last operand flag in response to a last operand having been processed by said memory algorithm processor assembly.
15. The multiprocessor computer system of claim 11 including:
at least one memory device associated with said plurality of memory algorithm processors for storing a plurality of pre loaded identified algorithms.
16. The multiprocessor computer system of claim 15 wherein said at least one memory device is responsive to a predetermined command from a data processor and operates in response thereto to selected one of said plurality of pre-loaded identified algorithms to be implemented by an addressed one of said plurality of memory algorithm processors.
17. The multiprocessor computer system of claim 16 wherein said at least one memory device comprises at least one read only memory device.
18. The multiprocessor computer system of claim 11 wherein each of said plurality of memory algorithm processors comprises a field programmable gate array.
19. The multiprocessor computer system of claim 11 wherein each of said plurality of memory algorithm processors is memory accessible through normal memory access protocol.
20. The multiprocessor computer system of claim 11 wherein each of said plurality of memory algorithm processors has direct memory access capability to said memory bank.
21. The multiprocessor computer system of claim 11 wherein each of said plurality of memory algorithm processors is operative to pass a result of a processed operand to another memory algorithm processor.
22. The multiprocessor computer system of claim 11 operative to detect at least one parallel region of said at least one application program, wherein at least one of said plurality of memory algorithm processors is configured as a function of said detected at least one parallel region of said at least one application program.
23. A system for processing data using a plurality of reconfigurable processors, the system comprising:
a memory subsystem coupled to a data processor and including an addressable memory array;
a first reconfigurable processor within the memory subsystem and coupled to a first address in the addressable memory array, wherein responsive to a first data value being written at the first address, the first reconfigurable processor performs a first configured function, generates a second data value, and writes the second data value to a second address in the addressable memory array;
a second reconfigurable processor within the memory subsystem and coupled to the second address in the addressable memory array, wherein, responsive to the second data value being written at the second address, the second reconfigurable processor retrieves the second data and performs a second configured function;
a control logic block in the memory subsystem in the communication path between the data processor and the addressable memory array for accessing data at specified addresses within the addressable memory array;
a data bus and an address bus connecting the control logic block and the addressable memory array;
a communication path between the first reconfigurable processor and the address bus; and a control block in the communication path between the first reconfigurable processor and the address bus, wherein the control block comprises a command decoder for decoding commands from the data processor, a pipeline counter for counting clock cycles, an equality comparator for determining whether the output of the pipeline counter corresponds to a predetermined number of clock cycles, and status registers for receiving an output from the equality comparator.
24. The system of claim 23 wherein the second reconfigurable processor generates a third data value.
25. The system of claim 23, further comprising a communication path between the first reconfigurable processor and the data bus.
26. The system of claim 23, wherein the data processor transmits commands over the address bus.
27. The system of claim 23, wherein the data processor periodically checks the status register.
28. A method of data processing using reconfigurable processors, the method comprising:
configuring a first reconfigurable processor within a memory subsystem to perform a first function;
configuring a second reconfigurable processor within a memory subsystem to perform a second function;
writing a first data value to a first memory address location in the memory subsystem;
reading the first data value into a first reconfigurable processor within the memory subsystem;
performing the first function in the first reconfigurable processor using the first data value to generate a second data value;
writing the second data value to a second memory address within the memory subsystem;
reading the second data value into a second reconfigurable processor within the memory subsystem;
performing the second function in the second reconfigurable processor using the second data value to generate a third data value;
receiving a command to terminate the data processing;
counting the number of clock cycles that have elapsed since the command was received; and generating a signal when a predetermined number of clock cycles has passed.
29. The method of claim 28 wherein the third data value is written to a third memory location in the memory subsystem.
30. The method of claim 28 wherein performing the first function includes multiplying.
31. The method of claim 29 wherein configuring the first reconfigurable processor includes a fixed instruction set processor selecting configuration bits corresponding to the first function.
32. The method of claim 31 wherein the fixed instruction set processor performing a math function.
33. The method of claim 32 wherein the math function is a 64-bit floating point math function.
34. The method of claim 31 further comprising:
signaling the fixed instruction set processor when the third data value is available.
35. The method of claim 34 wherein the signaling includes writing a status value to a status register.
36. The method of claim 28 wherein writing the second data value includes operatively passing the second data value from the first reconfigurable function unit to the second reconfigurable function unit.
37. A computer system comprising:
at least one processor;
at least one circuit of direct execution logic;
a common memory space accessible by said at least one processor and said at least one circuit of direct execution; and a unified executable program comprising a first portion thereof executable by said at least one processor and a second portion thereof executable by said at least one circuit of direct execution logic;
wherein said at least one circuit of direct execution logic is programmed to perform at least one identified algorithm on an operand received from said common memory space.
38. The computer system of claim 37, wherein said at least one processor comprises a microprocessor.
39. The computer system of claim 37, wherein said at least one circuit of direct execution logic comprises at least one field programmable gate array.
40. The computer system of claim 37, wherein said at least one circuit of direct execution logic is operative to access said common memory space independently of said at least one processor.
41. The computer system of claim 37, wherein said at least one identified algorithm is programmed into a memory device associated with said circuit of direct execution logic.
42. The computer system of claim 41, wherein said memory device comprises at least one read only memory device.
43. The computer system of claim 37, wherein said first portion of said unified executable program executable by said at least one processor is resident in said common memory space.
44. The computer system of claim 37, wherein said second portion of said unified executable program is resident in said at least one circuit of direct execution logic.
45. The computer system of claim 37, wherein said second portion of said unified executable program is resident in said at least one field programmable gate array.
46. The computer system of claim 37, wherein said at least one processor comprises a fixed instruction set processor.
47. A method for operating a computer system comprising:
providing at least one processor;
providing at least one circuit of direct execution logic;
enabling access by said at least one processor and said at least one circuit of direct execution logic to a common memory space;

executing a unified executable program on said computer system such that a first portion of said unified executable program is executable by said at least one processor and a second portion of said unified executable program is executable by said at least one circuit of direct execution logic;
wherein said common memory space is accessible by said at least one circuit of direct execution logic independently of said at least one processor.
48. The method of claim 47, wherein said step of providing at least one processor is carried out by a microprocessor.
49. The method of claim 47, wherein said step of providing at least one processor is carried out by a fixed instruction set processor.
50. The method of claim 47, wherein said step of providing at least one circuit of direct execution logic is carried out by at least one field programmable gate array.
51. The method of claim 47, further comprising:
programming said at least one circuit of direct execution logic to perform at least one identified algorithm received from said common memory space.
52. The method of claim 51, further comprising:
storing said at least one identified algorithm in a memory device associated with said circuit of direct execution logic.
53. The method of claim 52, wherein said step of storing said at least one identified algorithm is carried out by a read only memory device.
54. The method of claim 47, further comprising:
storing said first portion of said unified executable program in said common memory space.
55. The method of claim 47, further comprising:
storing said second portion of said united executable program in said at least one circuit of direct execution logic.
56. The method of claim 47, further comprising:
storing said second portion of said unified executable program in said at least one field programmable gate array.
57. A system for processing data using a plurality of circuits of direct execution logic, said system comprising:
at least one processor;
a common memory space coupled to said at least one processor and said plurality of circuits of direct execution logic;
a first one of said plurality of circuits of direct execution logic coupled to a first address in said common memory space and responsive to a first data value being written to said first address, said first one of said plurality of circuits of direct execution logic performing a first configured function in accordance with a unified executable program, generating a second data value and writing said second data value to a second address in said common memory space;
a second one of said plurality of circuits of direct execution logic coupled to said second address in said common memory space and responsive to said second data value being written to said second address, said second one of said plurality of circuits of direct execution logic retrieving said second data value and performing a second configured function in accordance with said unified executable program;
a first control logic block in a first communication path between said at least one processor and said common memory space for accessing data at specified addresses within said common memory space;
a data bus and an address bus coupling said control logic block and said common memory space;
a third communication path between said first one of said plurality of circuits of direct execution logic and said address bus;
a second control logic block in said third communication path between said first one of said plurality of circuits of direct execution logic and said address bus;

where said second control logic block comprises a command decoder for decoding commands from said at least one processor, a pipeline counter for counting clock cycles, an equality comparator for determining whether an output of said pipeline counter corresponds to a predetermined number of said clock cycles and status registers for receiving an output from said equality comparator.
58. The system of claim 57, wherein said second one of said plurality of circuits of direct execution logic generates a third data value.
59. The system of claim 57, further comprising a second communication path between said first one of said plurality of circuits of direct execution logic and said data bus.
60. The system of claim 57, wherein said at least one processor transmits commands on said address bus.
61. The system of claim 57, wherein said at least one processor periodically accesses said status register.
62. The system of claim 57, wherein said first and second ones of said plurality of circuits of direct execution logic comprise field programmable gate arrays.
63. The system of claim 57, wherein said first and second ones of said plurality of circuits of direct execution logic are operative to access said common memory space independently of said at least one processor.
64. The system of claim 57, wherein said first one of said plurality of circuits of direct execution logic is programmed to perform at least one identified algorithm on an operand received from said common memory space.
65. The system of claim 64, wherein said at least one identified algorithm is programmed into a memory device associated with said first one of said plurality of circuits of direct execution logic.
66. The system of claim 65, wherein said memory device comprises at least one read only memory device.
67. The system of claim 57, wherein a first portion of said unified executable program is resident in said common memory space for execution by said at least one processor.
68. The system of claim 57, wherein a second portion of said unified executable program is resident in said first one of said plurality of circuits of direct execution logic.
69. The system of claim 57, wherein said at least one processor comprises a fixed instruction set processor.
70. A method for processing data utilizing circuits of direct execution logic coupled to a common memory space, said method comprising:
configuring a first circuit of direct execution logic to perform a first function;
configuring a second circuit of direct execution logic to perform a second function;
writing a first data value to a first memory address location in said common memory space;
reading said first data value into said first circuit of direct execution logic;
performing said first function in said first circuit of direct execution logic using said first data value to generate a second data value;
writing said second data value to a second memory address within said common memory space;
reading said second data value into said second circuit of direct execution logic;
performing said second function in said second circuit of direct execution logic using said second data value to generate a third data value;
receiving a command to terminate processing of said data;

counting a number of clock cycles that shave elapsed since said command was received; and generating a signal when a predetermined number of clock cycles has passed.
71. The method of claim 70, wherein said third data value is written to a third memory location in said common memory space.
72. The method of claim 70, wherein performing said first function includes multiplying.
73. The method of claim 72, wherein configuring said first circuit of direct execution logic includes at least one processor selecting configuration bits corresponding to said first function.
74. The method of claim 73, wherein said at least one processor comprises a fixed instruction set processor.
75. The method of claim 73, wherein said at least one processor performs a math function.
76. The method of claim 75, wherein said math function comprises a 64-bit floating point math function.
77. The method of claim 73, further comprising:
signaling said at least one processor when said third data value is available.
78. The method of claim 77, wherein said signaling said at least one processor includes writing a status value to a status register.
79. The method of claim 70, wherein writing said second data value includes operatively passing said second data value from said first circuit of direct execution logic to said second circuit of direct execution logic.
80. The method of claim 73, wherein said configuring said first circuit of direct execution logic is carried out in accordance with a unified executable program.
81. The method of claim 73, wherein said at least one processor is operative in accordance with said unified executable program.
82. A computer system comprising:
at least one processor;
at least one circuit of direct execution logic;
a common memory space accessible by said at least one processor and said at least one circuit of direct execution logic; and a unified executable program comprising a first portion thereof executable by said at least one processor and a second portion thereof executable by said at least one circuit of direct execution logic;
wherein said at least one circuit of direct execution logic is operative to access said common memory space independently of said at least one processor.
83. The computer system of claim 82, wherein said at least one processor comprises a microprocessor.
84. The computer system of claim 82, wherein said at least one circuit of direct execution logic comprises at least one field programmable gate array.
85. The computer system of claim 82, wherein said at least one circuit of direct execution logic is programmed to perform at least one identified algorithm on an operand received from said common memory space.
86. The computer system of claim 85, wherein said at least one identified algorithm is programmed into a memory device associated with said circuit of direct execution logic.
87. The computer system of claim 86, wherein said memory device comprises at least one read only memory device.
88. The computer system of claim 82, wherein said first portion of said unified executable program executable by said at least one processor is resident in said common memory space.
89. The computer system of claim 82, wherein said second portion of said unified executable program is resident in said at least one circuit of direct execution logic.
90. The computer system of claim 82, wherein said second portion of said unified executable program is resident in said at least one field programmable gate array.
91. The computer system of claim 82, wherein said at least one processor comprises a fixed instruction set processor.
92. A system for processing data using a plurality of circuits of direct execution logic, said system comprising:
at least one processor;
a common memory space coupled to said at least one processor and said plurality of circuits of direct execution logic;
a first one of said plurality of circuits of direct execution logic coupled to a first address in said common memory space and responsive to a first data value being written to said first address, said first one of said plurality of circuits of direct execution logic performing a first configured function in accordance with a unified executable program, generating a second data vale and writing said second data value to a second address in said common memory space; and a second one of said plurality of circuits of direct execution logic coupled to said second address in said common memory space and responsive to said second data value being written to said second address, said second one of said plurality of circuits of direct execution logic retrieving said second data value and performing a second configured function in accordance with said unified executable program;

wherein said first and second ones of said plurality of circuits of direct execution logic are operative to access said common memory space independently of said at least one processor.
93. The system of claim 92, further comprising:
a first control logic block in a first communication path between said at least one processor and said common memory space for accessing data at specified addresses within said common memory space.
94. The system of claim 93, further comprising a data bus and an address bus coupling said control logic block and said common memory space.
95. The system of claim 94, further comprising a second communication path between said first one of said plurality of circuits of direct execution logic and said data bus.
96. The system of claim 94, further comprising a third communication path between said first one of said plurality of circuits of direct execution logic and said address bus.
97. The system of claim 96, further comprising a second control logic block in said third communication path between said first one of said plurality of circuits of direct execution logic and said address bus.
98. The system of claim 97, where said second control logic block comprises a command decoder for decoding commands from said at least one processor, a pipeline counter for counting clock cycles, an equality comparator for determining whether an output of said pipeline counter corresponds to a predetermined number of said clock cycles and status registers for receiving an output from said equality comparator.
99. The system of claim 98, wherein said at least one processor transmits commands on said address bus.
100. The system of claim 98, wherein said at least one processor periodically accesses said status register.
101. The system of claim 92, wherein said first and second ones of said plurality of circuits of direct execution logic comprise field programmable gate arrays.
102. The system of claim 92, wherein said first one of said plurality of circuits of direct execution logic is programmed to perform at least one identified algorithm on an operand received from said common memory space.
103. The system of claim 102, wherein said at least one identified algorithm is programmed into a memory device associated with said first one of said plurality of circuits of direct execution logic.
104. The system of claim 103, wherein said memory device comprises at least one read only memory device.
105. The system of claim 92, wherein a first portion of said unified executable program is resident in said common memory space for execution by said at least one processor.
106. The system of claim 92, wherein a second portion of said unified executable program is resident in said first one of said plurality of circuits of direct execution logic.
107. The system of claim 92, wherein said at least one processor comprises a fixed instruction set processor.
CA2515283A 1997-12-17 1998-12-03 Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem Expired - Lifetime CA2515283C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08/992,763 US6076152A (en) 1997-12-17 1997-12-17 Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem
US08/992,763 1997-12-17
CA002313462A CA2313462C (en) 1997-12-17 1998-12-03 Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA002313462A Division CA2313462C (en) 1997-12-17 1998-12-03 Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem

Publications (2)

Publication Number Publication Date
CA2515283A1 true CA2515283A1 (en) 1999-06-24
CA2515283C CA2515283C (en) 2011-04-05

Family

ID=35253791

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2515283A Expired - Lifetime CA2515283C (en) 1997-12-17 1998-12-03 Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem

Country Status (1)

Country Link
CA (1) CA2515283C (en)

Also Published As

Publication number Publication date
CA2515283C (en) 2011-04-05

Similar Documents

Publication Publication Date Title
US6961841B2 (en) Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem
US5822244A (en) Method and apparatus for suspending a program/erase operation in a flash memory
US4942519A (en) Coprocessor having a slave processor capable of checking address mapping
US6131139A (en) Apparatus and method of simultaneously reading and writing data in a semiconductor device having a plurality of flash memories
US6401197B1 (en) Microprocessor and multiprocessor system
JPH0798692A (en) Microcomputer
US4926318A (en) Micro processor capable of being connected with a coprocessor
EP0139254A2 (en) Apparatus and method for direct memory to peripheral and peripheral to memory data transfer
US4628445A (en) Apparatus and method for synchronization of peripheral devices via bus cycle alteration in a microprocessor implemented data processing system
US5752066A (en) Data processing system utilizing progammable microprogram memory controller
US6622244B1 (en) Booting from a reprogrammable memory on an unconfigured bus by modifying boot device address
US5664156A (en) Microcontroller with a reconfigurable program status word
US4814977A (en) Apparatus and method for direct memory to peripheral and peripheral to memory data transfers
CA2515283A1 (en) Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem
RU2110088C1 (en) Parallel processor with soft-wired structure
US20020004877A1 (en) Method and system for updating user memory in emulator systems
US7487287B2 (en) Time efficient embedded EEPROM/processor control method
EP0138045A2 (en) Apparatus and method for synchronization of peripheral devices via bus cycle alteration in a microprocessor implemented data processing system
CA1165454A (en) Odd byte memory accessing in data processing apparatus
JPH04155454A (en) Information processor
JPH0497459A (en) Cache coincidence processing system
JPH05151020A (en) Digital signal processor
JPH03282953A (en) Semiconductor device having device selecting circuit
JP2004171232A (en) Integrated circuit

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20181203