US20170212739A1 - Processor With Reconfigurable Pipelined Core And Algorithmic Compiler - Google Patents
Processor With Reconfigurable Pipelined Core And Algorithmic Compiler Download PDFInfo
- Publication number
- US20170212739A1 US20170212739A1 US15/416,972 US201715416972A US2017212739A1 US 20170212739 A1 US20170212739 A1 US 20170212739A1 US 201715416972 A US201715416972 A US 201715416972A US 2017212739 A1 US2017212739 A1 US 2017212739A1
- Authority
- US
- United States
- Prior art keywords
- processor
- reconfigurable
- core
- icat
- cores
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012545 processing Methods 0.000 claims abstract description 114
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 82
- 230000006870 function Effects 0.000 claims abstract description 32
- 230000015654 memory Effects 0.000 claims abstract description 30
- 230000002093 peripheral effect Effects 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims description 41
- 230000008569 process Effects 0.000 claims description 31
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 4
- 230000003068 static effect Effects 0.000 claims description 2
- 102100031680 Beta-catenin-interacting protein 1 Human genes 0.000 description 159
- 101000993469 Homo sapiens Beta-catenin-interacting protein 1 Proteins 0.000 description 158
- 239000000284 extract Substances 0.000 description 14
- 230000008901 benefit Effects 0.000 description 12
- 238000013461 design Methods 0.000 description 11
- 238000003491 array Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 5
- 230000003111 delayed effect Effects 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- LHMQDVIHBXWNII-UHFFFAOYSA-N 3-amino-4-methoxy-n-phenylbenzamide Chemical compound C1=C(N)C(OC)=CC=C1C(=O)NC1=CC=CC=C1 LHMQDVIHBXWNII-UHFFFAOYSA-N 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013439 planning Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- RVCKCEDKBVEEHL-UHFFFAOYSA-N 2,3,4,5,6-pentachlorobenzyl alcohol Chemical compound OCC1=C(Cl)C(Cl)=C(Cl)C(Cl)=C1Cl RVCKCEDKBVEEHL-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4022—Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
- G06F15/7885—Runtime interface, e.g. data exchange, runtime control
- G06F15/7889—Reconfigurable logic implemented as a co-processor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
- G06F30/3308—Design verification, e.g. functional simulation or model checking using simulation
- G06F30/331—Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/447—Target code generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the field relates to computer programming and microprocessor design and programming, especially reconfigurable, pipelined and parallel processing of general purpose software instructions.
- FIG. 1A illustrates a conventional processor's compiler.
- Conventional processors such as Intel micro-processors and ARM micro-processors are well known.
- FIG. 1B a conceptual illustration of a conventional processor is shown in FIG. 1B .
- These processors are the heart of central processing units for modern computers and devices and are used to process algorithms.
- a problem with conventional processors is that these types of processors are general purpose and are not reconfigurable in any practical way that allows their performance to be enhanced for specific applications.
- Another problem is that the program execution control adds substantial overhead to processing of algorithmic functions, such as mathematical operations and logical decisions that modify the flow of processing.
- a higher level programming language may be used to program the conventional processor, and the compiler converts the instructions in the higher level programming language into machine code for the particular processor architecture.
- This machine code is provided to a memory location accessible by the processor and provides instructions for operation of the processor hardware, together with any BIOS or other calls provided by the system architecture.
- mathematics and logical processing directions are directed to an arithmetic logic unit (ALU), which returns a solution to a program execution control portion of the processor, which manages overhead, such as guiding the processor through the correct order of solving mathematical algorithms, logic decisions, handling of data and the like.
- Machine code instructions are continuously fetched from program storage in order to control the processing of data. This overhead significantly limits machine performance.
- the following illustrates steps of a conventional compiler compiling a mathematical operation in a “C” programming language, which is an example of a higher level programming language that may be compiled to create machine code for a particular conventional processor.
- a simple mathematical operation assigns “var i1;” “var i2;” and “var s;” to define a data storage location for variable i1, i2 and result s.
- the compiler (a) first assigns storage locations for data (e.g. i1, i2 and s) and (b) generates source code into machine code.
- a conventional processor would retrieve all or a portion of the machine code from a memory location in which the code is stored. Then, it would execute the machine code.
- the central processing unit CPU
- the central processing unit would load i1 data in a memory location and send it to the ALU, load i2 data in a memory location and send it to the ALU, and instruct the ALU to add the data located in i1 and i2. Only then would the ALU perform an addition of the values located in the data locations for i1 and i2. This is the useful work step, with the setup by the CPU being overhead. Then, the CPU could get the ALU result from the data location for “s” and could send it to the input and output controller.
- ASICs Application specific integrated circuits
- FPGAs field programmable gate arrays
- RAM random access memory
- FIG. 2 illustrates a structure for this FGPA architecture.
- This issued patent includes an array of FPGAs that changes configurations successively during performance of successive algorithms or instructions. The configuring of array of FPGAs allows an entire algorithm or set of instructions to be performed without waiting for each instruction to be downloaded in performing each computational step.
- FIG. 2 illustrates a block diagram of a virtual computer including an array of field programmable gate arrays and field programmable interconnection devices (FPIN) or cross-bar switches that relieve internal resources of the field programmable gate arrays from any external connection tasks, as disclosed in U.S. Pat. No. 5,684,980, the disclosure and drawings of which are hereby incorporated herein in their entirety for the purpose of disclosing the knowledge of a skilled artisan, familiar with FPGAs.
- FPIN field programmable interconnection devices
- FIG. 2 illustrates an array of field programmable gate arrays and field programmable interconnection devices that are arranged and employed as a co-processor to enhance the performance of a host computer or within a virtual computer processor to perform successive algorithms.
- the successive algorithms must be programmed to correspond with a series of conventional instructions that would normally be executed in a conventional microprocessor. Then, the rate of performing the specific computational task of the successive algorithms by the FPGA/FPIN array is much less than the rate of the corresponding instructions performed by a conventional microprocessor.
- the virtual computer of FIG. 2 must include a reconfigurable control section that governs the reconfiguration of the FPGA/FPIN array.
- the configuration bit files must be generated for the reconfigurable control section using a software package designed for that purpose.
- FIG. 2 illustrates how the arrays and dual port random access memory (RAM) are connected by pins to the reconfigurable control section, a bus interface and computer main memory.
- the bus interface is connected to a system bus.
- U.S. Pat. No. 5,684,980 shows how the pins provide a clock pin and a pin connecting the reconfigurable control section to the FPGA/FPIN arrays, and shows an example of a reconfigurable control section.
- U.S. Pat. No. 4,291,372 discloses a microprocessor system with specialized instruction formatting which works in conjunction with an external application dependent logic module handling specific requirements for data transfer to and from a peripheral device.
- the microprocessor provides a program memory having a specialized instruction format.
- the instruction word format provides a single bit field for selecting either a program counter or a memory reference register as the source of memory address, a function field which defines the route of data transfers to be made, and a source and destination field for addressing source and destination locations.
- peripheral controller units burdened the system with processor and control circuits in the base module for handling the specific requirements.
- Digital Signal Processing (DSP) units or arrays of DSP processors may be hardwired into parallel arrays that optimize performance for some graphic intensive tasks, such as pixel processing for generating images on output screens, such as monitors and televisions. These are custom made and include a BIOS specific to the graphical acceleration environment created for the digital signal processors to do their job.
- DSP Digital Signal Processing
- Matrix bus switching is known.
- the user guide “AMBA® 4 AXI4TM, AXI4-LiteTM, and AXI4-StreamTM Protocol Assertions, Revision: r0p1, User Guide,” copyright 2010, 2012, referenced as ARM DUI 0534B, ID072312 teaches a system for matrix bus switching that is high speed and implementable by a person having ordinary skill in the art.
- the user guide is written for system designers, system integrators, and verification engineers who want to confirm that a design complies with a relevant AMBA 4 protocol.
- This can be AXI4, AXI4-Lite, or AXI4-Stream, for example. All of the trademarks are registered trademarks of ARM in the EU and elsewhere. Where excepted, this reference is incorporated herein in its entirety by reference.
- An MBS is a high speed bus for data input and output, and this reference teaches the methods and hardware for a system engineer to integrate an example of an MBS in a processor system architecture.
- a pipelined, parallel processor on a chip comprises a processing unit and an array of reconfigurable, field programmable gates programmed by an algorithmic matching pipelined compiler, which can be a precompiler, such that the algorithmic matching pipelined compiler precompiles source code designed for operation on a standard processor without parallel processing for processing by the processing unit, and the processing unit and algorithmic matching pipelined compiler (referred to as AMPC or ASML) configures the field programmable gates to operate as pipelined, parallel processors.
- the processor may be referred to as a reusabe algorithmic pipelined core (RAPC).
- the parallel processors are configured to complete tasks without any further overhead from the processing unit, such as overhead for controlling an arithmetic processing unit.
- a reusable algorithmic pipelined processor comprises a pool of computers configured to process algorithms in parallel using standard higher level software languages, such as “C”, “C++” or the like.
- the pool of computers are reprogrammed to run different algorithms as needed for a particular calculation, based on the output of the AMPC, which is set up with the RAPC resources available to it.
- a reusable algorithmic pipelined core may be comprised of three modules: an intelligent bus controller or logical decision processor (LDP), a digital signal processor (DSP), and a matrix bus switch.
- LDP logical decision processor
- a logical decision processor (LDP) comprises reconfigurable logic functions, reprogrammable depending on need, for controlling of a master bus switch (MBS).
- MCS master bus switch
- a DSP comprises a reconfigurable mathematical processor for performing mathematical operations. In one example, all of the mathematical operations processed by the RAPC are processed by the DSP. In one example, all of the logic functions processed by the RAPC are processed by the LDP.
- a matrix bus router or switch (MBR or MBS) is defined as a reconfigurable, programmable circuit that routes data and results from one RAPC to another RAPC and from/to an input/output controller, and/or interrupt generators, as required, to complete an algorithm, without any further intervention from a central or peripheral processor during the processing of the algorithm.
- MRR or MBS matrix bus router or switch
- overhead is much reduced by pipelining compared to static, unreconfigurable hardware, which requires intervention by a central processor or peripheral processor to direct data and results in and out of arithmetic processing units.
- the LDP processes logical decisions and iterative loops and result memory is provided by the LDP for learning algorithms.
- an algorithmic matching pipelined compiler generates machine code from a higher level, compilable software language, such as “C”, “C++”, Pascal, Basic or the like.
- Standard source code written for a conventional, non-reconfigurable and non-pipelined, general purpose computer processor, may be processed by the AMPC to generate machine code for configuring one or more of the RAPC's.
- the AMPC generates machine code from standard, preexisting code for a conventional ARM processor or a conventional Intel processor, and the machine code generated by this AMPC precompiler uses an ARM processor or an Intel processor to configure the RAPC's.
- a new computer system comprises a conventional processor, such as an existing ARM processor, Intel processor, AMD processor or the like, and a plurality of RAPC's, each RAPC comprising a DSP, LDM and MBS, for example.
- a conventional processor such as an existing ARM processor, Intel processor, AMD processor or the like
- RAPC's are not merely peripheral co-processors.
- the RAPC's are reconfigured to independently solve complex mathematical and logic algorithms without further intervention by the conventional processor, after the precompiler or AMPC configures the RAPC's to do their job. Values are input into the configured RAPC and a solution is output to the MBS.
- a plurality of RAPC's are disposed on a single chip, such as a reconfigurable ASIC.
- Reconfigurabe ASIC means a chip designed to comprise RAPC's such that each of the RAPC's is reprogrammable for specific operations by an AMPC and a general purpose, existing processor architecture, such as an ARM processor, an AMD processor, and Intel processor or the like.
- a reconfigurable ASIC may contain 2000 RAPC's and may operate 360 trillion instructions per second with a 500 MHz clock speed.
- a single reconfigurable ASIC comprising 2000 RAPC's can operate 100 times faster than any conventional, general purpose processor today.
- existing programs may be ported over to operate with a reconfigurable ASIC comprising a plurality of RAPC's and benefit from pipelined execution of instructions, in parallel, without substantially rewriting existing high level programming.
- the AMPC precompiles existing code for an ARM general purpose processor architecture that is embedded on a reconfigurable ASIC comprising a plurality of RAPC's.
- This new processor architecture (ICAT) achieves surprising and unexpected performance by combining the ARM processor architecture and a plurality of RAPC's on a chip.
- the embedded ARM processor on the ICAT chip executes machine code instructions generated by the AMPC from preexisting programs written in a high level programming language, such as “C”, which configure the plurality of RAPC's on the ICAT chip to perform surprisingly rapid execution of instructions per second.
- the ARM processor also controls intelligent monitoring, diagnostics and communications with peripherals external to the ICAT chip.
- the ICAT chip appears to be a very fast ARM processor that does not require a mathematical co-processor.
- the ICAT chip can embed an Intel processor and appears to the outside world as an Intel processor.
- the RAPC's when configured by the central processing unit, operate without overhead, executing instructions until the math, logic and iterative instructions for which the RAPC have been configured are completed.
- the AMPC extracts configuration data for the setup registers of an ICAT chip from a program written for a standard processing architecture in a high level programming language, such as “C”. For example, the AMPC ignores overhead instructions and generates code for the setup registers of the ICAT chip from the program for 1) arithmetic instructions and data; 2) logic decisions and data; and 3) branch or call/return instructions and destinations; 4) iterative loops, decisions and data; 5) DSP setup routines and data; and 6) code entry point labels for loops and branches.
- C high level programming language
- the AMPC is aware of the RAPC resources, which are assigned by the AMPC while precompiling code written in the high level programming language.
- the ICAT architecture may be configured by the AMPC to optimize usage of the RAPC resources, such as by minimizing interconnect length between instructions executed by the plurality of RAPC's. This optimization may be completed by an interactive approach or a trial and error approach.
- the AMPC comprises a learning algorithm that improves the optimizations based on historical patterns of usage of certain instructions, whether mathematical algorithms, logical algorithms or a combination of mathematical and logical algorithms, such as by minimizing the use of the MBS for branch or call of a destination for common instruction sets. For an example of an MBS implementation, see the ARM MBS example in the background.
- Status data includes, carry out, equal, greater than, and less than, for example.
- each LDP has a setup interface for programming a lookup table, a loop counter, and a constant register.
- Each LDP has a “Loop Counter” for detecting when iterative algorithms are completed.
- Each LDP has a register that can hold constant data for input to the lookup table.
- Each LDP has a block of memory, which can be used to perform functions.
- Lookup table functions may include a lookup table that can be implemented and sequentially accessed using the loop counter; a lookup table that can be implemented and accessed by the DSP status, the constant register, or the DSP result data for control purposes; and a logic lookup table that can be implemented and output miscellaneous logic signals for control purposes, for example.
- the LDP may pass result data from its input to its output.
- the LDP may have one pipeline register for result data at its output, for example.
- the LDP may have two pipeline registers with synchronous clear enables for result data at its output.
- the chip may be an ICAT chip comprising a plurality of the RAPC's, each comprising a DSP, an LDP and an MBS and each being setup by code provided by the AMPC to a conventional processor.
- the AMPC creates machine code from a “C” language source code for operation of a conventional processor, such as an ARM processor, and the ARM processor sets up each of the DSP, LDP and MBS portions of each of the RAPC's that will be used in processing data input to the processor and outputting data from the processor.
- a conventional processor such as an ARM processor
- the ARM processor sets up each of the DSP, LDP and MBS portions of each of the RAPC's that will be used in processing data input to the processor and outputting data from the processor.
- a system for configuring a reconfigurable processor comprises a non-reconfigurable processor, a plurality of reconfigurable cores, and an Algorithmic Matching Pipelined Compiler capable of accepting code written in a high level programming language for the non-reconfigurable processor, wherein the Compiler identifies code written in the high level programming language that could benefit from pipelining available on one or more of the plurality of reconfigurable cores and outputs code for the non-reconfigurable processor to set up the one or more of the plurality of non-reconfigurable processors.
- a processor comprises a non-reconfigurable processor core and a plurality of Reusable Algorithmic Pipelined Cores coupled to the non-reconfigurable processor core such that the non-reconfigurable processor core is capable of configuring and reconfiguring each of the plurality of Reusable Algorithmic Pipelined Cores as a result of instructions received from an Algorithmic Matching Pipelined Compiler.
- the processor is contained in a single chip.
- An Algorithmic Matching Pipelined Compiler or AMPC is a compiler capable of accepting code written in a high level programming language for a conventional non-reconfigurable processor, wherein the AMPC identifies code written in the high level programming language that could benefit from pipelining available on a reconfigurable core or processor, such as an RAPC or Filed Programmable Gate Array, and outputs code for a non-reconfigurable processor, which instructs the non-reconfigurable processor to configure the reconfigurable core or processor, prior to providing instructions for using the reconfigurable core or processor.
- a reconfigurable core or processor such as an RAPC or Filed Programmable Gate Array
- a Reusable (or reconfigurable) Algorithmic Pipelined Core (or computer) or RAPC is defined as a reconfigurable processing core with a pipelined structure comprising a DSP including a setup interface for programming any of a plurality of operations, such as integer and floating point math, with four inputs for operand data that can be concatenated or operated on with various combinations of mathematic functions as determined by the setup data, and a 48 bit accumulator which is output as result data along with the status data; an LDP having a setup interface for programming a lookup table, a loop counter and a constant register and a block of memory, which can be used to perform functions; and an MBS.
- a DSP including a setup interface for programming any of a plurality of operations, such as integer and floating point math, with four inputs for operand data that can be concatenated or operated on with various combinations of mathematic functions as determined by the setup data, and a 48 bit accumulator which is output as result data along with the status data
- An MBS is defined as a reconfigurable, programmable circuit that routes data and results from one RAPC to another RAPC and from/to an input/output controller, and/or interrupt generators, as required, to complete an algorithm, without any further intervention from a central or peripheral processor during the processing of the algorithm.
- FIG. 1A illustrates a prior art flow chart for a conventional compiler.
- FIG. 1B illustrates a prior art processor for a conventional computer.
- FIG. 2 illustrates a block diagram from U.S. Pat. No. 5,684,980.
- FIG. 3 is a flow chart illustrating an example of an AMPC compiler for comparison with the flow chart in FIG. 1A .
- FIG. 4 is an example of an ICAT architecture.
- FIG. 5 shows a flow diagram of an example of how a programmer may use an AMPC.
- FIG. 6 is a schematic example of a reusable algorithmic pipelined computer.
- FIG. 8 illustrates a dramatic benefit from the raw processing power of the example of FIG. 7 by real time lossless data compression in a consumer electronic device.
- an ICAT architecture mimics any standard microprocessor unit architecture. Its architecture takes advantage of pipelining and a much richer gate density in an integrated circuit designed to be configured by a customer or a designer after manufacturing, such as one or more field programmable gate arrays (FPGA's) to achieve a 100:1 advantage in MIPS when a 1:1 comparison is made with a single standard microprocessor architecture with the same clock speed.
- FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be “wired together”, like many logic gates that can be inter-wired in different configurations.
- Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.
- the very large jump in performance allows the processor to be used for data intensive applications, such as machine vision, video processing, audio processing, robotics control systems, multi-axis control systems, mobile communications, virtual reality, artificial intelligence, livestreaming, biometric monitoring, the Internet of Things, supercomputing, quantum computing, aerospace control systems, simulation and modeling of complex systems, and signal processing applications, for example.
- data intensive applications such as machine vision, video processing, audio processing, robotics control systems, multi-axis control systems, mobile communications, virtual reality, artificial intelligence, livestreaming, biometric monitoring, the Internet of Things, supercomputing, quantum computing, aerospace control systems, simulation and modeling of complex systems, and signal processing applications, for example.
- the ICAT may be run in a configuration of as many parallel processors as needed for an application, increasing performance even further compared to standard microprocessors.
- a plurality of processor architectures may be run simultaneously.
- legacy code may be run on a virtual machine compatible with the legacy code, while a new virtual machine runs code written specifically for the new architecture. In one example, this reduces the need for extensive regression testing, such as would be required for adapting legacy code to the new system architecture.
- the speed and expandability of the ICAT architecture is applied to legacy systems incapable of processing the volume of data required for raw speed and expandability for customers whose code and/or hardware has run into limitations.
- reconfiguration is compiled at or before power up, greatly simplifying planning with little impact on final product performance.
- an FPGA is a host hardware for this architecture. Millions of instructions per second (MIPS) may be added, easily, without major rewrites to existing code. Existing code may be run almost unmodified, except for recompilation of the existing code. For example, algorithms requiring parallel processing of a large number of common inputs are ideal candidates for this ICAT architecture.
- MIPS instructions per second
- the ICAT architecture comprises a front end pre-compiler that catches any potential code incompatibility issues. This front end pre-compiler automatically resolves these potential code incompatibility issues.
- the ICAT architecture may emulate a variety of processor architectures familiar to different developers.
- the ICAT architecture may emulate more than one processor, allowing a project to be coded for a plurality of developers' favored processors and to run code on a plurality of different virtual processors at the same time.
- a plurality of different processors would run different code sets in a multi-processing environment, and program developers compile code for one of the plurality of the domains compatible with the code.
- the ICAT architecture includes a compiler or pre-compiler, which checks legacy code for hardware specific commands, which is optimized for use with a high level programming language, such as C or C++.
- a compiler or pre-compiler which checks legacy code for hardware specific commands, which is optimized for use with a high level programming language, such as C or C++.
- FIG. 1 and FIG. 3 illustrates the additional steps included in an Algorithmic Matching Pipelining Compiler (AMPC), for example.
- AMPC Algorithmic Matching Pipelining Compiler
- a set of standard multi-processing/multitasking peripherals, with in-built coordination is provided by the ICAT architecture.
- a real time operating system may be adopted.
- a multi-tasking, real time operating system is incorporated into the ICAT architecture.
- Micro-Controller Operating Systems (MicroC/OS) is a real-time operating system designed by embedded software developer, Jean J. Labrosse in 1991. It is a priority-based pre-emptive real-time operating system for microprocessors, written mainly in the C programming language, a higher level programming language.
- the raw speed of the ICAT architecture allows use of such a RTOS, for example.
- a pipelined architecture is achieved using standard Verilog or VHDL code.
- a 1024 word instruction cache, a data cache, and multi-level memory cache architectures may be provided in the ICAT architecture.
- Pipelining of the ICAT architecture may include a learning algorithm that detects which way branching on decision processing tends to occur, making that path the default path on future passes through the learning algorithm.
- interrupt code is isolated, and an interrupt handler is dedicated to specific inputs, with a private code location.
- the ICAT architecture includes a multi-processor debugger. For example, existing code may be processed by a pre-processing debugger to ensure that the existing code is well partitioned, so that the functions are separated. Then, a single debugger may be run on each independent thread of an operation.
- a reconfigurable algorithmic pipelined core may be provided in a 2 inch chip package that provides MIPS and Mega FLOPS equivalent to more than 1000 Intel i7 micro-processors, more preferably more than 10,000 Intel i7 micro-processors.
- the ICAT architectures compiler or pre-compiler detects low level code timing loops that count clock cycles, delays that allow instruction fetching, and other incompatible timing code, and flags these for repair or replacement, either manually or automatically, with compatible higher level programming provided within the ICAT architecture.
- the ICAT architecture comprises an algorithmic matching pipeline compiler (AMPC), which is a compiler accepting processing algorithms in standard source code formats.
- the AMPC generates firmware for a conventional processing system operable with the ICAT architecture.
- the compiler generates instructions that configure the ICAT hardware, such that the architecture processes algorithms with improved performance compared to traditional micro-processors that are not reconfigurable by the AMPC.
- the AMPC uses pipelining to optimize processor performance for applications requiring algorithmic intensive computational processing. For example, this firmware may be run on a conventional processing system to configure ICAT(s) hardware architectures that process algorithms with optimal performance.
- the AMPC provides a compiler that compiles conventional compiler source code capable of generating code for operating the ICAT hardware configuring the ICAT architecture's processor resources to directly process algorithms.
- the AMPC utilizes source code that is compatible with conventional compilers, such as C, C#, C++, Matlab or other conventional compilers.
- firmware generated by the AMPC runs on a main processing system of the ICAT architecture.
- the main processing system is a conventional processor on the same chip as the remainder of the ICAT architecture and operates seamlessly with the ICAT architecture.
- the AMPC accepts code written in high level programming languages for source code, such as C, C#, C++, and the AMPC outputs firmware for the ICAT architecture that runs on the main processing system. This simplifies the coding for operation of the ICAT architecture by allowing the firmware for the ICAT architecture to be programmed in a higher level programming language familiar to the developer.
- the raw speed of the ICAT architecture eliminates the penalty and reduces any need to program machine level code for optimizing speed.
- the AMPC of the ICAT architecture may compile software syntax, such as an if-then-else process, into firmware that reconfigures the ICAT architecture's hardware to optimally execute the process in fewer clock cycles, using pipelining, for example.
- firmware By running the firmware, the ICAT architecture is configured.
- conventional compilers build firmware that all conventional processors use, but the conventional processors are not reconfigured by the firmware.
- the AMPC builds firmware for the ICAT architecture, configuring the ICAT architecture for optimal operation in a particular application, for example.
- the AMPC selects and structures the configuration of the ICAT hardware using the algorithms as input structure for the ICAT architecture's processor hardware.
- the hardware architecture of the ICAT architecture is optimized by the AMPC for processing speed performance for a particular application, when configured by the AMPC generated firmware.
- the AMPC can reconfigure the hardware of the ICAT architecture, where a conventional compiler cannot reconfigure the ICAT or any micro-processor's hardware.
- the AMPC generates firmware that configures the ICAT architectures processors to directly perform pipelined processing and routing of data based on prior results in hardware. For example, the if-then-else logic statement input into the AMPC would structure the hardware of the ICAT architecture to route data results to the next ICAT.
- the AMPC generates hardware configurations eliminating overhead of conventional processing systems, such as code fetching, data loading, data storing, branching, and subroutines for the same if-then-else logic.
- FIG. 4 illustrates an example of an ICAT architecture.
- a conventional compiler such as Visual Studio
- This provides a method for configuring and reconfiguring reprogrammable pools of hardware which are reconfigurable to run and process various type processing Algorithms in a chip.
- a conventional processing system e.g. Intel, ARM, IBM, AMD microprocessors
- a conventional processing system cannot be reconfigured to run various algorithms, because only the software, not the hardware, can change in a conventional processing system.
- the ICAT architecture of FIG. 4 provides a re-configurable hardware configurable for performing efficient processing of data utilizing a pool of parallel processer resources implemented in a system on chip (SOC) device 100 .
- SOC system on chip
- a pool of mathematic processors 107 followed by logic processers 108 and configurable matrix routing 109 implements a pool of parallel processing resources 102 .
- This architecture is capable of pipeline processing resources to optimize processing performance for particular applications.
- the pool of processors 102 perform multiple processing tasks, independently of the main processor 101 , without receiving further instructions from the main processor.
- Each ICAT may be configured to process an entire algorithm as a standalone processor system.
- an ICAT can be considered a system within itself, requiring no overhead to complete processing of an algorithm, once configured to perform the algorithm.
- an ICAT may be configured to perform an if-then-else instruction set and may be reconfigured, later, to perform a completely different instruction set, such as a fast Fourier transform or other mathematical algorithm solution.
- the ICAT architecture By reducing unnecessary cycles of activity, the ICAT architecture reduces power consumption, generates less heat, and increases the speed of processing data, when compared to a conventional processor.
- the ICAT resources 102 are idle until they get configured, when data is ready to be processed at their inputs. All of the processors are kept in an idle state when not needed, reducing heat generated from any unnecessary overhead.
- Each processor in the pool of ICAT resources have less overhead than conventional processors, because the ICAT does not fetch and execute code. Instead, the hardware is configure to perform a specific operation and is only active when data is provided that needs to be processed with the configured algorithm provided by the ICAT architecture.
- a single ICAT processor uses a pool of mathematic processors 107 , logic processors 108 , and output steered by configurable matrix routing 109 .
- This same ICAT processor may be used for a simple processing task, such as an if-then-else, or for a very advanced complex algorithm, such as an algorithm used in facial recognition.
- the ICAT architecture may be used for processing tasks requiring a plurality of calculations in a pipelined architecture, such as motion, shape, or identity detection, for example.
- the algorithm controls the interconnect bus structure of the ICAT processors, and the ICAT architecture processes input data streams from output devices 112 , such as video, sensors or data from a previous process step. For example, prior results may be streamed from data memory buffers, live input data or any data from other processed steps 110 , 111 . Processing results may be output directly to devices 113 , such as control output or video output, for example.
- output devices 112 such as video, sensors or data from a previous process step.
- prior results may be streamed from data memory buffers, live input data or any data from other processed steps 110 , 111 .
- Processing results may be output directly to devices 113 , such as control output or video output, for example.
- FIG. 5 illustrates a 6 step flow diagram for a programmer, who initially inserts an original high level programming language source code into first compiler (the AMPC is referred to a ASML).
- the ASML pre-compiler extracts code from the original source in step 2 , which occurs automatically. Then, the pre-compiler outputs new source code to a second compiler. This step can be done either automatically or as a separate step by the programmer, after the programmer is satisfied that the new source is debugged and optimized.
- This second compiler compiles a firmware build for the ICAT architecture. Then, the firmware is loaded into the ICAT architecture, and the firmware configures the RAPC's of the ICAT architecture. The programmer may upload this firmware into the ICAT architecture after the programmer is satisfied that the firmware is debugged and optimized, for example.
- each of the steps may be automated and may occur without human intervention, except for loading the original source code into the ICAT architecture.
- the entire process may be automated, such that the conventional processor runs the AMPC to recompile the original source code to generate firmware that is used by the conventional processor to set up the RAPC's, based on the instructions contained in the original source code.
- a pool of ICAT resources may contain three types of processor modules, for example, such as mathematic modules, logical modules, and result routing modules. Mathematics modules perform math functions. Logic modules performs logic functions. Result routing modules perform branching and data routing functions.
- a setup bus 109 is established by configuration of the setup registers of the ICAT architecture by the AMPC. Operands are directed to memory locations A, B, C and D on a digital signal processor (DSP) 110 . The DSP is configured to execute an mathematical algorithm. Results of the algorithm are directed to a logical decision processor (LDP) 111 . The LDP executes logical instructions. Results of the logical instructions are delivered to the next RAPC, directly or via the matrix bus switch (MBS). The MBS directs results to the next RAPC or controls inputs and outputs and interrupts for delivery of the results on a high speed streaming interface.
- MBS matrix bus switch
- Hardware resources may be configured into ICAT co-processor systems that are interconnected in a pipelined structure for optimal performance.
- Hardware resources for configuring ICAT processors may be designed into the chip, and the hardware resources in the chip are re-configurable via AMPC.
- the architecture of an ICAT processing system is configured from the source code for processing algorithms, for example.
- code generated for a conventional processor may be run much more efficiently on an ICAT architecture, because the hardware of the ICAT processors is configured by the source code to perform algorithms independently of the processor using AMPC, for example.
- the ICAT architecture is capable of configuring the ICAT hardware architecture from source code created for a conventional microprocessor, which has not been known in the art.
- a pool of hardware resources are created that are configurable and reconfigurable into algorithmic matrix structures by a processor, and the pool of hardware resources then actually process a plurality of processing algorithms in a chip.
- the hardware resources process data through an plurality of commands independently of other processors using pipelining.
- an AMPC configures hardware resources for running a plurality of processing algorithms.
- AMPC generates the configuration setup firmware used to configure processing algorithms from the pool of ICAT resources in an ICAT chip. This provides a programmer with a tool that accepts existing application source code, designed for a conventional processor and new source code designed for matching and assigning ICAT hardware resources to create individual hardware processing algorithms within the ICAT architecture.
- AMPC generates the firmware that runs the main processor to configure the ICAT hardware to perform a plurality of algorithms independent of the main processor, during operation of the SOC for a particular purpose.
- processors use a similar architecture comprising program memory, fetch and execution hardware which is used for step by step execution of program instructions; data memory which is needed for storage of bulk (heap) data and program stack structures; and instruction fetch & execution cycles, management of program stack, and management of data heap storage which all create considerable overhead in a conventional processer architecture.
- an ICAT architecture eliminates almost all of the overhead of conventional processor systems.
- the ICAT hardware pool is configured by the AMPC and is used to processes algorithms using the ICAT co-processor architecture with pipelined streaming data structures.
- a method using the ICAT architecture comprises AMPC accessing ICAT hardware compiler tables defining the resources available in the chip; a hardware design language, such as Verilog, is used to compile the pool of ICAT hardware 102 , for a given processor; hardware compilation outputs tables that define the structure of the ICAT resource pools within the chip; the AMPC uses these tables of data generated by the hardware compiler to determine the locations and quantities of ICAT resources in the chip; AMPC assigns hardware resources, configures math and logic operations, and creates interconnections for the various algorithms, wherein the source input syntax for the AMPC may be comprises of C# syntax or standard mathematic syntax, such as Matlab; the AMPC configures a pipelined structure for each algorithm from the pool of ICAT hardware resources that are available 103 .
- the AMPC outputs code that runs on the main processing system 101 that configures the control registers 103 , 104 , 105 , 106 of the resources that run algorithms on the parallel ICAT(s) co-processors 102 .
- a co-processor system structure may be configured from a pool of ICAT resources 102 , which respond to input from a main processor 101 , for example.
- a pool of ICAT resources 102 may generate interrupts and output data to the main processor 101 or input/output devices of the main processor 101 , if the main processor architecture includes input/output devices separate from the main processor.
- a pool of ICAT resources 102 may be configured by a conventional processor 101 , then the ICAT resources 102 run on their own until re-configured.
- the ICAT architecture's processors will continuously process data streams in parallel, on their own, once the ICAT processors are configured by the firmware.
- a conventional system requires endlessly going to memory and fetching instructions to determine the process flow at each processing step.
- the AMPC may assign a hardware group of resources, such as math logic and routing, for example, to a particular ICAT processor structure of the ICAT architecture in order to execute processing steps for the processing of a particular algorithm, for example.
- No conventional compiler selects and configures hardware structures of a micro-processor. For example, when the AMPC builds the hardware structure of the ICAT architecture it may configure the hardware resources for an ICAT architecture in a pipelined architecture that speeds processing performance. A conventional complier cannot do this.
- ICAT Control Registers 104 are a set of registers for controlling processing functions.
- a digital signal processor (DSP) Input Mode Register may include Split Input Words, Pre-Adder Control, Input Register Bank Select and other DSP Input functions, DSP ALU Mode Register may control add, subtract, multiply, divide, shift right, shift left, rotate, and, or, xor, nor nand, and other logic processes, and DSP Multiplexor Selects may control Shifts and Input Selects.
- the DSP may utilize one DSP48E1 for each ICAT.
- the DSP48E1 devices may be provided in a Xilinx 7 series of field programmable gate arrays.
- an ICAT memory and logic operations 105 may be used to control memory and memory logic operations.
- a motion detection algorithm is written in the C language for use on a general purpose computer.
- FIG. 7 shows a schematic illustration of a diagram of a hardware configuration resulting from a compilation of Code Example 1 with an AMPC compiler.
- a video device 111 has two outputs: a stream of live video pixels 113 and a frame delay buffer stream 112 .
- each pixel comprises red, green and blue.
- the DSP 115 performs a comparison of the live feed and the delayed feed, and the result is pipelined 117 to the LDP 116 , which determines if motion is detected.
- the result is output by the MBS of the RAPC 114 .
- a single RPAC is configured to implement the 3 processing blocks that execute in parallel every clock cycle. In comparison, a conventional processing system requires execution of 37 instructions to process each pixel of video to detect motion.
- the example configuration of the single RAPC processor configured by an AMPC compiler from Code Example 1 processes a continuous stream of pixels using the video's pixel clock.
- Three processing blocks (DSP, LDP, and MBS) are implemented in a pipelined, streaming configuration of FPGA's with three clock cycles of latency, but each clock cycle after the pipeline is filled (after the first three clock cycles of the video's pixel clock) processes output of a pixel, which is one pixel per clock cycle compared to one pixel per 111 clock cycles.
- a single RAPC performs at least 111 times faster than a single core of a conventional processing system, i.e.
- a pixel is process each clock cycle on the ICAT compared to 37 instructions ⁇ 3 clock cycles per instruction or 111 clock cycles per pixel for the conventional processor. Since two thousand (or more) RAPC processors may be implemented on a single ICAT chip, the combined processing power could be at least 222,000 faster than a single core conventional processor. Current conventional processors are limited to quad core or the like, but adding cores to a conventional processor is not without additional overhead. Many more RAPC's can be added than conventional processing cores, and each can be reconfigured as a pipeline alone or together with other RAPC's.
- any good high level language programmer can program ICAT technology, because the front end, the microprocessor architecture is a familiar, general purpose architecture.
- the RAPC's are configured by the general purpose processor and the AMPC, which uses the standard structure of each RAPC to reconfigure one or more RAPC's, based on standard code for the front end processor, as illustrated in the diagram of FIG. 7 , for example.
- the ICAT technology including a plurality of the RAPC's and an AMPC for configuring and reconfiguring the RAPC's using a standard processor architecture facing the world, is a surprising and unexpected advance over conventional processors and any known FPGA processors.
- Each RAPC may comprise a DSP, an LDP and an MBS.
- a DSP may have a setup interface for programming the types of operations required, (i.e. integer and floating point, multiply, divide, add, subtract, etc.).
- a DSP may have four inputs for operand data that can be concatenated or operated on with various combinations of mathematic functions as determined by the setup data, such as illustrated in FIG. 8 .
- the DSP may have a 48-bit accumulator which is output as result data along with the status data. Status data includes, carry out, equal, greater than, and less than, for example.
- An LDP may have a setup interface for programming the lookup table, the loop counter, and the constant register, for example.
- the LDP may have a Loop Counter for detecting when iterative algorithms are completed.
- the LDP may have a register that can hold constant data for input to the lookup table.
- the LDP may have a block of memory that can be used to perform functions.
- LUT functions may include a lookup table that can be implemented and sequentially accessed using the loop counter; a lookup table that can be implemented and accessed by the DSP status, the constant register, or the DSP result data for control purposes; and a logic lookup table that can be implemented and output miscellaneous logic signals for control purposes.
- the LDP may pass result data from its input to its output.
- the LDP may have one pipeline register for result data at its output, for example. Alternatively, the LDP may have two pipeline registers with synchronous clear enables for result data at its output.
- 3325 RAPC's may be configured on a single Xilinx® Zynq® FPGA chip, where Xilinx® and Zync® are trademarks of Xilinx, Inc., running at a modest clock rate of 100 MHz.
- each of the RAPC's can process 1 or 2 logic operations and a mathematic operation.
- this configuration produces 332 GigaFLOPS.
- this configuration uses look up tables (LUT) for each of four mathematical operations (e.g. add, subtract, multiply, divide) and four logic operations (e.g. greater than, less than, equal, not equal).
- LUT look up tables
- the standard LUT memory size is 512 bytes.
- a “greater than a configurable constant value” LUT may be provided, in addition to the other logic operation LUT's.
- the output signals of the LUT's are used to control the bus multiplexor switches for steering results between RAPC's.
- the AMPC compiler precompiles source code of a higher level program language written for a von Neumann architecture, and the AMPC compiler selects LUT's for each operation being performed by a RAPC, generating a non-von-Neumann processor from source code written for the von Neumann architecture.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Advance Control (AREA)
- Logic Circuits (AREA)
- Devices For Executing Special Programs (AREA)
- Stored Programmes (AREA)
- Microcomputers (AREA)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/416,972 US20170212739A1 (en) | 2016-01-26 | 2017-01-26 | Processor With Reconfigurable Pipelined Core And Algorithmic Compiler |
US15/919,885 US10515041B2 (en) | 2016-01-26 | 2018-03-13 | Processor with reconfigurable pipelined core and algorithmic compiler |
US16/675,876 US10970245B2 (en) | 2016-01-26 | 2019-11-06 | Processor with reconfigurable pipelined core and algorithmic compiler |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662287265P | 2016-01-26 | 2016-01-26 | |
US15/416,972 US20170212739A1 (en) | 2016-01-26 | 2017-01-26 | Processor With Reconfigurable Pipelined Core And Algorithmic Compiler |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/919,885 Continuation US10515041B2 (en) | 2016-01-26 | 2018-03-13 | Processor with reconfigurable pipelined core and algorithmic compiler |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170212739A1 true US20170212739A1 (en) | 2017-07-27 |
Family
ID=59359078
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/416,972 Abandoned US20170212739A1 (en) | 2016-01-26 | 2017-01-26 | Processor With Reconfigurable Pipelined Core And Algorithmic Compiler |
US15/919,885 Active US10515041B2 (en) | 2016-01-26 | 2018-03-13 | Processor with reconfigurable pipelined core and algorithmic compiler |
US16/675,876 Active US10970245B2 (en) | 2016-01-26 | 2019-11-06 | Processor with reconfigurable pipelined core and algorithmic compiler |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/919,885 Active US10515041B2 (en) | 2016-01-26 | 2018-03-13 | Processor with reconfigurable pipelined core and algorithmic compiler |
US16/675,876 Active US10970245B2 (en) | 2016-01-26 | 2019-11-06 | Processor with reconfigurable pipelined core and algorithmic compiler |
Country Status (17)
Country | Link |
---|---|
US (3) | US20170212739A1 (fr) |
EP (1) | EP3408737A4 (fr) |
JP (1) | JP7015249B2 (fr) |
KR (1) | KR20180132044A (fr) |
CN (1) | CN108885543A (fr) |
AU (1) | AU2017211781B2 (fr) |
BR (1) | BR112018015276A2 (fr) |
CA (1) | CA3012781C (fr) |
CL (1) | CL2018002025A1 (fr) |
CO (1) | CO2018008835A2 (fr) |
IL (1) | IL279302B2 (fr) |
MX (1) | MX2018009255A (fr) |
MY (1) | MY191841A (fr) |
PH (1) | PH12018501591A1 (fr) |
RU (1) | RU2018130817A (fr) |
SG (1) | SG11201806395SA (fr) |
WO (1) | WO2017132385A1 (fr) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958852A (zh) * | 2018-07-16 | 2018-12-07 | 济南浪潮高新科技投资发展有限公司 | 一种基于fpga异构平台的系统优化方法 |
US20210076985A1 (en) * | 2019-09-13 | 2021-03-18 | DePuy Synthes Products, Inc. | Feature-based joint range of motion capturing system and related methods |
US20210113097A1 (en) * | 2018-04-03 | 2021-04-22 | Nec Corporation | Heart failure degree-of-exacerbation determination system and heart failure degree-of-exacerbation determination method |
US20210127994A1 (en) * | 2018-07-20 | 2021-05-06 | Omron Healthcare Co., Ltd. | Biometric data measurement system and biometric data measurement method |
US20210315459A1 (en) * | 2018-08-17 | 2021-10-14 | Koninklijke Philips N.V. | System and method for providing an indication of a person's gum health |
CN113703843A (zh) * | 2021-09-24 | 2021-11-26 | 中国人民解放军军事科学院军事医学研究院 | 一种寄存器数据处理方法、装置及存储器 |
US20210383928A1 (en) * | 2020-06-05 | 2021-12-09 | Samsung Electronics Co., Ltd. | Apparatus and method for estimating bio-information |
US20210401332A1 (en) * | 2018-11-15 | 2021-12-30 | My-Vitality Sàrl | Self-monitoring and care assistant for achieving glycemic goals |
US20220233093A1 (en) * | 2021-01-22 | 2022-07-28 | AsthmaTek, Inc. | Systems and methods to provide a physician interface that enables a physician to assess asthma of a subject and provide therapeutic feedback |
US20220330835A1 (en) * | 2021-04-08 | 2022-10-20 | Wistron Corporation | Hybrid body temperature measurement system and method thereof |
US20230018671A1 (en) * | 2021-06-14 | 2023-01-19 | Tata Consultancy Services Limited | Method and system for personalized eye blink detection |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101996842B1 (ko) * | 2018-12-26 | 2019-07-08 | (주)자람테크놀로지 | 사용자 정의 명령어 셋을 지원하는 하드웨어 고속 연산 결합형 risc-v 기반 연산 장치 및 그 방법 |
US11080227B2 (en) * | 2019-08-08 | 2021-08-03 | SambaNova Systems, Inc. | Compiler flow logic for reconfigurable architectures |
CN113222126B (zh) * | 2020-01-21 | 2022-01-28 | 上海商汤智能科技有限公司 | 数据处理装置、人工智能芯片 |
CN111444159B (zh) * | 2020-03-03 | 2024-05-03 | 中国平安人寿保险股份有限公司 | 精算数据处理方法、装置、电子设备及存储介质 |
US11809908B2 (en) | 2020-07-07 | 2023-11-07 | SambaNova Systems, Inc. | Runtime virtualization of reconfigurable data flow resources |
CN111813526A (zh) * | 2020-07-10 | 2020-10-23 | 深圳致星科技有限公司 | 用于联邦学习的异构处理系统、处理器及任务处理方法 |
US11782729B2 (en) | 2020-08-18 | 2023-10-10 | SambaNova Systems, Inc. | Runtime patching of configuration files |
CN117311247B (zh) * | 2023-11-30 | 2024-03-26 | 山东盛泰矿业科技有限公司 | 一种用于地下采矿的控制装置 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030066057A1 (en) * | 2001-02-23 | 2003-04-03 | Rudusky Daryl | System, method and article of manufacture for collaborative hardware design |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4291372A (en) * | 1979-06-27 | 1981-09-22 | Burroughs Corporation | Microprocessor system with specialized instruction format |
US5684980A (en) * | 1992-07-29 | 1997-11-04 | Virtual Computer Corporation | FPGA virtual computer for executing a sequence of program instructions by successively reconfiguring a group of FPGA in response to those instructions |
US5966534A (en) * | 1997-06-27 | 1999-10-12 | Cooke; Laurence H. | Method for compiling high level programming languages into an integrated processor with reconfigurable logic |
US20060081971A1 (en) * | 1997-09-30 | 2006-04-20 | Jeng Jye Shau | Signal transfer methods for integrated circuits |
US6718457B2 (en) * | 1998-12-03 | 2004-04-06 | Sun Microsystems, Inc. | Multiple-thread processor for threaded software applications |
TW463175B (en) * | 2000-03-01 | 2001-11-11 | Winbond Electronics Corp | Memory processing method and system |
US7000213B2 (en) * | 2001-01-26 | 2006-02-14 | Northwestern University | Method and apparatus for automatically generating hardware from algorithms described in MATLAB |
WO2005026925A2 (fr) * | 2002-05-21 | 2005-03-24 | Washington University | Stockage et traitement intelligents de donnees utilisant des dispositifs fpga |
JP2006065786A (ja) * | 2004-08-30 | 2006-03-09 | Sanyo Electric Co Ltd | 処理装置 |
US7818725B1 (en) * | 2005-04-28 | 2010-10-19 | Massachusetts Institute Of Technology | Mapping communication in a parallel processing environment |
US7843215B2 (en) * | 2007-03-09 | 2010-11-30 | Quadric, Inc. | Reconfigurable array to compute digital algorithms |
US8214814B2 (en) * | 2008-06-24 | 2012-07-03 | International Business Machines Corporation | Sharing compiler optimizations in a multi-node system |
US20130212366A1 (en) * | 2012-02-09 | 2013-08-15 | Altera Corporation | Configuring a programmable device using high-level language |
JP2014016894A (ja) * | 2012-07-10 | 2014-01-30 | Renesas Electronics Corp | 並列演算装置、並列演算装置を備えたデータ処理システム、及び、データ処理プログラム |
US9218289B2 (en) * | 2012-08-06 | 2015-12-22 | Qualcomm Incorporated | Multi-core compute cache coherency with a release consistency memory ordering model |
-
2017
- 2017-01-26 JP JP2018558111A patent/JP7015249B2/ja active Active
- 2017-01-26 MY MYPI2018702593A patent/MY191841A/en unknown
- 2017-01-26 KR KR1020187024664A patent/KR20180132044A/ko not_active Application Discontinuation
- 2017-01-26 EP EP17744897.4A patent/EP3408737A4/fr active Pending
- 2017-01-26 SG SG11201806395SA patent/SG11201806395SA/en unknown
- 2017-01-26 US US15/416,972 patent/US20170212739A1/en not_active Abandoned
- 2017-01-26 CN CN201780020270.0A patent/CN108885543A/zh active Pending
- 2017-01-26 CA CA3012781A patent/CA3012781C/fr active Active
- 2017-01-26 WO PCT/US2017/015143 patent/WO2017132385A1/fr active Application Filing
- 2017-01-26 RU RU2018130817A patent/RU2018130817A/ru unknown
- 2017-01-26 MX MX2018009255A patent/MX2018009255A/es unknown
- 2017-01-26 BR BR112018015276A patent/BR112018015276A2/pt not_active Application Discontinuation
- 2017-01-26 AU AU2017211781A patent/AU2017211781B2/en active Active
-
2018
- 2018-03-13 US US15/919,885 patent/US10515041B2/en active Active
- 2018-07-26 PH PH12018501591A patent/PH12018501591A1/en unknown
- 2018-07-26 CL CL2018002025A patent/CL2018002025A1/es unknown
- 2018-08-24 CO CONC2018/0008835A patent/CO2018008835A2/es unknown
-
2019
- 2019-11-06 US US16/675,876 patent/US10970245B2/en active Active
-
2020
- 2020-12-08 IL IL279302A patent/IL279302B2/en unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030066057A1 (en) * | 2001-02-23 | 2003-04-03 | Rudusky Daryl | System, method and article of manufacture for collaborative hardware design |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210113097A1 (en) * | 2018-04-03 | 2021-04-22 | Nec Corporation | Heart failure degree-of-exacerbation determination system and heart failure degree-of-exacerbation determination method |
CN108958852A (zh) * | 2018-07-16 | 2018-12-07 | 济南浪潮高新科技投资发展有限公司 | 一种基于fpga异构平台的系统优化方法 |
US20210127994A1 (en) * | 2018-07-20 | 2021-05-06 | Omron Healthcare Co., Ltd. | Biometric data measurement system and biometric data measurement method |
US20210315459A1 (en) * | 2018-08-17 | 2021-10-14 | Koninklijke Philips N.V. | System and method for providing an indication of a person's gum health |
US20210401332A1 (en) * | 2018-11-15 | 2021-12-30 | My-Vitality Sàrl | Self-monitoring and care assistant for achieving glycemic goals |
US20210076985A1 (en) * | 2019-09-13 | 2021-03-18 | DePuy Synthes Products, Inc. | Feature-based joint range of motion capturing system and related methods |
US20210383928A1 (en) * | 2020-06-05 | 2021-12-09 | Samsung Electronics Co., Ltd. | Apparatus and method for estimating bio-information |
US20220233093A1 (en) * | 2021-01-22 | 2022-07-28 | AsthmaTek, Inc. | Systems and methods to provide a physician interface that enables a physician to assess asthma of a subject and provide therapeutic feedback |
US20220330835A1 (en) * | 2021-04-08 | 2022-10-20 | Wistron Corporation | Hybrid body temperature measurement system and method thereof |
US20230018671A1 (en) * | 2021-06-14 | 2023-01-19 | Tata Consultancy Services Limited | Method and system for personalized eye blink detection |
CN113703843A (zh) * | 2021-09-24 | 2021-11-26 | 中国人民解放军军事科学院军事医学研究院 | 一种寄存器数据处理方法、装置及存储器 |
Also Published As
Publication number | Publication date |
---|---|
US20200142851A1 (en) | 2020-05-07 |
SG11201806395SA (en) | 2018-08-30 |
BR112018015276A2 (pt) | 2018-12-18 |
IL279302B2 (en) | 2023-06-01 |
AU2017211781B2 (en) | 2021-04-22 |
RU2018130817A3 (fr) | 2020-04-16 |
PH12018501591A1 (en) | 2019-04-08 |
CO2018008835A2 (es) | 2018-11-13 |
EP3408737A1 (fr) | 2018-12-05 |
CA3012781C (fr) | 2022-08-30 |
US10515041B2 (en) | 2019-12-24 |
RU2018130817A (ru) | 2020-02-27 |
KR20180132044A (ko) | 2018-12-11 |
CL2018002025A1 (es) | 2019-02-08 |
US10970245B2 (en) | 2021-04-06 |
US20180246834A1 (en) | 2018-08-30 |
EP3408737A4 (fr) | 2019-09-11 |
WO2017132385A1 (fr) | 2017-08-03 |
MX2018009255A (es) | 2019-03-18 |
JP2019506695A (ja) | 2019-03-07 |
IL279302A (en) | 2021-01-31 |
CA3012781A1 (fr) | 2017-08-03 |
MY191841A (en) | 2022-07-18 |
AU2017211781A1 (en) | 2018-09-13 |
CN108885543A (zh) | 2018-11-23 |
JP7015249B2 (ja) | 2022-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10970245B2 (en) | Processor with reconfigurable pipelined core and algorithmic compiler | |
US11436186B2 (en) | High throughput processors | |
Johnson et al. | General-purpose systolic arrays | |
Stitt | Are field-programmable gate arrays ready for the mainstream? | |
De Bosschere et al. | High-performance embedded architecture and compilation roadmap | |
Jesshope et al. | Design of SIMD microprocessor array | |
Giefers et al. | An FPGA-based reconfigurable mesh many-core | |
Capalija et al. | A coarse-grain fpga overlay for executing data flow graphs | |
Pfenning et al. | Transparent FPGA acceleration with tensorflow | |
Boppu | Code Generation for Tightly Coupled Processor Arrays | |
Aklah | A hybrid partially reconfigurable overlay supporting just-in-time assembly of custom accelerators on FPGAs | |
Baklouti et al. | Synchronous communication-based Many-core SoC | |
Guccione | Software for Reconfigurable Computing | |
Adário et al. | Reconfigurable computing: Viable applications and trends | |
Shannon | Reconfigurable computing architectures | |
Chickerur et al. | Reconfigurable Computing: A Review | |
Schwiegelshohn et al. | Reconfigurable Processors and Multicore Architectures | |
Cardoso | Data-driven array architectures: a rebirth? | |
Verdoscia et al. | Research Article A Data-Flow Soft-Core Processor for Accelerating Scientific Calculation on FPGAs | |
Miyajima | A Toolchain for Application Acceleration on Heterogeneous Platforms | |
YUE | FPGA OVERLAY ARCHITECTURES ON THE XILINX ZYNQ AS PROGRAMMABLE ACCELERATORS | |
Sivashanmugam | SPECIAL PURPOSE HARDWARE FOR IMAGE PROCESSING | |
Moeller | COMPUTATIONAL MODELING AND SIMULATION OF RECONFIGURABLE RESPONSIVE EMBEDDED COMPUTING SYSTEMS | |
Valero | Message from the HiPEAC coordinator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ICAT LLC, INDIANA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CATILLER, ROBERT;REEL/FRAME:041504/0686 Effective date: 20170307 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |