CN114730173A

CN114730173A - System, apparatus and method for conveyor belt treatment

Info

Publication number: CN114730173A
Application number: CN202080062645.1A
Authority: CN
Inventors: 齐亚·史莱蒙; 尼科·史莱蒙
Original assignee: Six Nuclear Co
Current assignee: Six Nuclear Co
Priority date: 2019-09-06
Filing date: 2020-09-04
Publication date: 2022-07-08
Also published as: EP4025966A4; KR20220058612A; GB2603078B; IL291141A; US20220334987A1; CA3153033A1; GB2625471A; GB202204584D0; US11704262B2; AU2020342665A1; EP4025966A1; JP2022552606A; GB2603078A; WO2021046581A1; GB202404004D0

Abstract

Reconfigurable hardware platforms use chains of reconfigurable hardware operator blocks in place of portions of software to manipulate data as it moves down the chain. Such a carousel architecture or chain of operator blocks moves data between operator blocks. Such a conveyor architecture processor may be combined with a conventional front-end processor to process complex information or critical loops in hardware while processing the rest of the program as software.

Description

System, apparatus and method for conveyor belt treatment

RELATED APPLICATIONS

This application claims priority and benefit to U.S. provisional patent application No. 62/896,682 filed 2019, 6/9, which is hereby incorporated by reference in its entirety.

Technical Field

The present disclosure relates to computers and more particularly to computer processors.

Background

Digital computers designed for general purpose computing may use standard architectures, such as the von neumann architecture. Von Neumann architecture machine designed by physicist and mathematician John neuumann (John von Neumann) around 1945 can be used as a theoretical design for a stored program digital computer.

Brief Description of Drawings

FIG. 1 shows a diagram illustrating an example of a computing system.

FIG. 2 shows a diagram illustrating a conveyor belt architecture computing system.

Fig. 3 shows the program source code for printing Fibonacci numbers (Fibonacci numbers).

Fig. 4 shows program machine code for printing a fibonacci number executed on a standard architecture system.

Fig. 5 shows a flow diagram of an operator block for printing a fibonacci number executed on a carousel architecture system.

Figure 6 shows the source code calculating and printing out the digital sum.

Figure 7 shows the first quarter of the machine code that performs the calculations on the standard architecture system and prints out the digital sum.

Figure 8 shows the second quarter of the machine code that performs the calculations on the standard architecture system and prints out the sum of the numbers.

Figure 9 shows the third quarter of the machine code that performs the calculations on the standard architecture system and prints out the digital sum.

FIG. 10 shows a fourth quarter of the machine code that performs the calculations on the standard architecture system and prints out the digital sum.

FIG. 11 shows a flow diagram of the computation performed on the carousel architecture system and printing out a mathematical block of a digital sum.

FIG. 12 shows a block diagram illustrating a carousel architecture computing system for use in conjunction with a standard architecture computing system.

FIG. 13 shows a block diagram illustrating how a program may be executed across a carousel architecture computing system and a standard architecture computing system.

FIG. 14 shows a flow diagram of a method for preparing a carousel architecture.

FIG. 15 is a block diagram illustrating a computing system and components.

Detailed Description

The following provides a detailed description of systems and methods consistent with embodiments of the present disclosure. While several embodiments are described, it should be understood that the present disclosure is not limited to any one embodiment, but encompasses numerous alternatives, modifications, and equivalents. In addition, while numerous specific details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed herein, some embodiments may be practiced without some or all of these details. Moreover, for the purpose of clarity, certain technical material that is known in the prior art has not been described in detail in order to avoid unnecessarily obscuring the present disclosure.

Techniques, apparatuses, and methods are disclosed for interconnecting a chain of reconfigurable hardware Operator blocks (Operator blocks) using a reconfigurable hardware platform to manipulate data as it moves down the chain. Such a carousel architecture or chain of operator blocks moves data between operator blocks. Instead of part of the software, a chain of reconfigurable hardware operator blocks manipulates data as it moves down the chain.

In some embodiments, the conveyor architecture computing system may be used solely to perform computing tasks.

Multiple conveyor belt architecture computing systems may be used in series or in parallel, for example, to share a workload among the computing systems.

The carousel architecture computing system may be used in conjunction with standard architecture computing systems, for example, workloads may be shared among the computing systems.

Multiple conveyor belt architecture computing systems may be used in series or in parallel, and in conjunction with standard architecture computing systems, for example, to share workloads between computing systems.

For example, a carousel architecture computing system may be used in conjunction with multiple standard architecture computing systems and share workloads between the computing systems.

Fig. 1 shows a diagram illustrating an example of a computing system similar to or including a von neumann architecture computing system. The computing system includes an input 102, a computing system 104, and an output 106. The input 102 is received to the computing system 104 (e.g., via a bus, etc.), and the input 102 is processed at the computing system 104 before being sent from the computing system 104 as output 106 (e.g., via a bus, etc.). Included within computing system 104 is Random Access Memory (RAM)108, which is coupled to a Central Processing Unit (CPU)112 via a common bus 110. In addition, the CPU 112 includes an Arithmetic Logic Unit (ALU)116, a control unit 114, and registers 118, and a stack 120.

A program executing on a standard architecture computing system may include a set of instructions that are executed in a particular order to manipulate data.

Once the program is loaded into RAM 108, CPU 112 may perform a series of "fetch-decode-execute" cycles whereby the contents of the RAM 108 locations are read, decrypted and then executed in a particular order as specified by the program. Since the locations in RAM 108 contain instructions and data, CPU 112 reads and decrypts the instructions to determine how to process the information, and then executes according to the results. Some instructions tell the CPU 112 to write the results of the operation back to RAM 108 locations, while other instructions tell the CPU 112 to jump to a particular location in RAM 108 based on the results of the previous instruction.

A problem with this architecture may be that program instructions and data are contained in the same RAM 108. The information in RAM 108 can be read one at a time and decrypted, resulting in architectural inefficiency and performance limitations. Further, common bus 110 may not allow CPU 112 to read and write information simultaneously. This is known as a bottleneck and may further limit the performance of the system.

FIG. 2 shows a diagram illustrating a carousel architecture computing system including an input 202, a computing system 204, and an output 206. The input 202 is received by a computing system 204 (e.g., via a bus, etc.), the input 202 being processed at the computing system 204 before being sent from the computing system 204 (e.g., via a bus, etc.) as the output 206. Within the computing system 204 is included a reconfigurable hardware platform 208 (e.g., a Field Programmable Gate Array (FPGA)) that includes a number of

reconfigurable operator blocks

210, 212, 214, 216, and 218 interconnected by

data paths

220, 222, 224, and 226 in one direction and data paths 228 in the opposite direction.

Instead of a CPU coupled to RAM via a bus, the carousel architecture may use a reconfigurable hardware platform (such as FPGA 208) to interconnect the chain of reconfigurable operator blocks 210, 212, 214, 216, and 218 to manipulate data as it moves between operator blocks on

data paths

220, 222, 224, and 226 in one direction along the chain and on data path 228 in the opposite direction.

In this embodiment, at each

abacus block

210, 212, 214, 216, and 218, an operation or set of operations is performed to manipulate data before it is carried on

data paths

220, 222, 224, and 226 in one direction and data path 228 in the opposite direction to the next abacus block in the chain.

The program is translated and then copied into a reconfigurable hardware platform 208 (e.g., FPGA, etc.). Each instruction or group of instructions is assigned to an

operator block

210, 212, 214, 216, and 218, and program flow is determined by the interconnection of these operator blocks.

As data flows along the chain between operator blocks on

data paths

220, 222, 224, and 226, the data is manipulated at each of the

operator blocks

210, 212, 214, 216, and 218.

In the case of a "jump" instruction, the data stream may be changed/redirected in the opposite direction by an operator block, or to some other operator block via a separate data path 228. In this example, jumps based on the satisfaction of the condition are displayed in the operator block 4(216) back to operator block 2 (212).

Further, the operator blocks 210, 212, 214, 216, and 218 may be autonomous and capable of processing data asynchronously or synchronously as they receive data from a previous operator block in the chain.

In one embodiment of autonomous operation, the carousel architecture allows multiple instructions to be executed in a single processor cycle.

The carousel architecture may be more efficient than the standard architecture because it does not require reading the program from RAM and decrypting it.

The carousel architecture may avoid the bottlenecks associated with conventional computer architectures because it does not rely on a common bus path and each set of operator blocks has its own data path.

The carousel architecture may allow for higher throughput and processing power. Another advantage of this architecture is that operating in a synchronous mode, the carousel architecture can pack data more densely into a reconfigurable hardware platform by queuing the data at each operator block input, ready to be loaded into subsequent operator blocks as it becomes more available.

Program instructions may be contained in the operator blocks in the form of hardware logic gates, rather than in software, which makes the instructions execute much faster than their software counterparts.

Another benefit of the carousel architecture is that the program may be more difficult to crack. The program may be stored as hardware and any modification of the program by a hacker may break the chain of the carousel and cause a system reset. A system reset may cause the system to automatically reload the original (unaltered) program into the reconfigurable hardware platform.

Fig. 3-11 illustrate the differences between the two architectures. The C source code and compiled output from the more traditional computing system and the carousel computing system were tested for two different programs.

Fig. 3 shows source code for printing a fibonacci number. The C source code for a program for printing out fibonacci numbers in the range 0 to 255 is shown.

Fig. 4 shows machine code for printing a fibonacci number. The C source code in fig. 3 may be compiled for execution on a conventional computing system. The resulting machine language may look similar to the list shown in fig. 4. Conventional computing systems may use at least 85 CPU clock cycles to complete the first iteration of the computation and print cycle. Thereafter, the conventional computing system may use at least 56 CPU cycles to complete subsequent iterations of the computation and print cycles.

Fig. 5 shows a flow chart for printing an algorithm block for a fibonacci number. In contrast to fig. 4, the C source code shown in fig. 3 may be compiled for execution on a carousel computing system. The resulting operator blocks for executing a program may be similar to those shown in FIG. 5.

The operator block 1(OB #1)502 is assigned values "x ═ 0" and "y ═ 1". Operator block 2(OB #2)504 performs the "printf" function. Operator block 3(OB #3)506 adds the contents of x and y and assigns them to variable z. It also assigns y to x and z to y. If the result of "x < 255" is true, then operator block 4(OB #4)508 performs a conditional jump back to the beginning of operator block 2504, and if the result of "x < 255" is false, then operator block 4 returns to the conditional jump of the beginning of operator block 1502.

In this embodiment, multiple instructions may be combined together in a

single abacus block

502, 504, 506, and 508, allowing multiple operations to be performed on data before it is passed to the next abacus block. When operating in synchronous mode, the conveyor architecture may use four processor clock cycles to complete the first and subsequent iterations of the computation and print cycles. This may allow the conveyor architecture machine in this example to operate 14 times faster than a conventional machine with similar clock cycles (i.e., 56 cycles versus 4 cycles).

Fig. 6 shows the source code for finding the digital sum. The C source code for a program that uses recursion to find the digital sum of numbers is shown in fig. 6.

Figures 7-10 illustrate source code for finding a digital sum. The C source code in fig. 6 may be compiled for execution on a von neumann computing system. The resulting machine language may look similar to the lists shown in fig. 7-10. The "main" loop invokes a separate "sum" loop 802 to compute and return results 1002. Within the sum loop, an "if" statement 902 is included. Depending on the result of the if statement, the digital computer may process a single iteration using 113 CPU clock cycles or 191 clock cycles.

Fig. 9 shows a flow chart for finding an operator block of a digital sum. In contrast to fig. 7-10, the C source code shown in fig. 6 may be compiled for execution on a carousel architecture computing system. The resulting operator blocks for executing programs may be similar to those shown in fig. 11.

Operator block 1(OB #1)1102 executes the "printf" function to print "Enter the number" on the output device. Operator block 2(OB #2)1104 executes a "scan" function to input numbers from an input device. Operator block 3(OB #3)1106 executes an "if" statement that compares the input number with 0 and then redirects the program to operator block 4(OB #4)1108 if the result is positive and to operator block 5(OB #5)1110 if the result is negative. The operator block 4(OB #4)1108 performs calculation. Operator block 5(OB #5)1110 returns 0. Operator block 6(OB #6)1112 assigns a number returned by OB #4(1108) or OB #5(1110) to the variable "sum". Operator block 7(OB #7)1114 executes the "printf" function to print sum on the output device. In addition, the output of the operator block 1114 is coupled to the input of OB #1(1102) to allow the program to loop indefinitely.

In this embodiment, the operator block may redirect the program chain according to the result of the condition. When operating in synchronous mode, the carousel architecture computing system may use six processor clock cycles to complete an iteration of a program, regardless of the result of the "if" instruction. The carousel architecture computing system in this example may be 18 times faster (i.e., 113 cycles versus 6 cycles) than the standard architecture computing system with respect to similar clock cycles.

Depending on the application, a carousel architecture computing system may be much faster than a standard architecture computing system. For example, a carousel architecture computing system may be faster in applications that process large amounts of data. The performance advantage of a carousel architecture computing system over a standard architecture computing system may depend on the program being executed. It has been noted through testing that in certain applications, advantages of 100% to 2,000% are possible.

FIG. 12 shows a block diagram illustrating a carousel architecture computing system for use in conjunction with a standard architecture computing system. In this embodiment, the standard architecture computing system front end 1202 is coupled to the carousel architecture computing system back end 1204 via a common bus 1206.

The standard architecture computing system front end includes the following components coupled together via a common bus arrangement 1206: a Central Processing Unit (CPU)1208, a Dynamic Random Access Memory (DRAM)1210, a Local Area Network (LAN) adapter 1212, a Basic Input and Output System (BIOS)1214, and a Hard Disk Drive (HDD) 1216. In the case of the HDD 1216, this is accomplished via an interface (I/F) 1218.

Also shown in this embodiment is a Graphics Processor Unit (GPU)1220 and another expansion processor 1222.

The carousel architecture computing system carousel back end contains an FPGA 1224, the FPGA 1224 being coupled to the rest of the system through a common bus 1206.

Since some programs may be idle for most of their operating time, it makes little sense to execute such idle code in a carousel architecture computing system. Rather, only certain portions of the program (e.g., critical loops, critical paths) may be translated and executed in the conveyor architecture computing system to perform "heavy work". The remainder of the program without the specified portion may still be executed in the front-end standard architecture computing system. Using both architectures together may avoid the necessity of translating the entire program to operate on a carousel architecture computing system. This may prevent the use of valuable carousel architecture computing system real estate (e.g., program space) that may not provide any tangible benefit. Furthermore, the use of both architectures ensures compatibility with existing programs designed to execute on standard architecture computing systems.

In some embodiments, the conveyor architecture computing system may be used in conjunction with a standard architecture computing system. FIG. 13 shows a block diagram illustrating how a program may be executed across a carousel architecture computing system and a standard architecture computing system. The standard architecture computing system front end 1302 is coupled to the carousel architecture computing system back end 1304 via a bus 1306. The body of the program 1308 then calls routines a 1312 and B1314 belt the fabric computing system via

call functions

1316 and 1320 and returns the results at 1318 and 1322, respectively.

FIG. 14 shows a flow diagram of a method for preparing a carousel architecture. The method may be performed by the systems and/or components described herein, including 204 in fig. 2. In block 1402, a conveyor belt system may receive a program configured to operate as software. In block 1404, the conveyor system may determine that a first portion of the program is running in hardware and a second portion of the program is running as software. In block 1406, the conveyor belt system may determine a plurality of interconnected reprogrammable algorithm blocks based on the first portion, including one or more transformation functions that take input data from a previous data bus, perform one or more transformations on the input data, and output the transformed input data via an output data bus. In block 1408, the carousel system may configure a plurality of interconnected reprogrammable computer sub-blocks to execute on one or more reprogrammable processors. In block 1410, the conveyor system may execute the second portion via one or more front-end processors. In block 1412, the carousel system may transmit the first data from the one or more front-end processors to the one or more reprogrammable processors. In block 1414, the conveyor system may execute the first portion via one or more reprogrammable processors. In block 1416, the conveyor belt system may transmit the second data from the one or more reprogrammable processors to the one or more front-end processors. In block 1418, the conveyor belt system may determine result data based on the first data and the second data.

The conveyor system may run in hardware based on a first portion of the computational complexity determination program. The result data may be the result of program execution. The second data may be derived from the first data based on one or more transformations. The result data may be derived from the second data, and the second data may be derived from the first data. The one or more reprogrammable processors may be field programmable gate arrays. Transmitting the first data from the one or more front-end processors to the one or more reprogrammable processors may further include transmitting the first data from the one or more front-end processors to the one or more reprogrammable processors via the expansion bus. Receiving a program configured to operate as software may also include compiling the program into executable software code, a hardware configuration, and communication code for transferring data between the executable software code and the hardware configuration.

The carousel processor may include an input data bus, a plurality of interconnected reprogrammable algorithm blocks, and an output data bus. The plurality of interconnected reprogrammable operand blocks may include: an input data bus of a first reprogrammable operand block, the input data bus of the first reprogrammable operand block coupled to an output data bus or the input data bus of a second reprogrammable operand block; an output data bus of a first reprogrammable operand block, the output data bus of the first reprogrammable operand block coupled to an input data bus or an output data bus of a third reprogrammable operand block; and one or more transformation functions that take input data from a previous data bus, perform one or more transformations on the input data, and output the transformed input data via an output data bus.

The bus widths of the plurality of interconnected reprogrammable operand blocks may not be the same. The output of a subsequent block may be the input of a previous block. The second reprogrammable operator block and the third reprogrammable operator block may be the same. The third reprogrammable operand block may precede the first reprogrammable operand block in execution order. The carousel processor may also include a programming interface configured to receive instructions for creating the plurality of interconnected reprogrammable algorithm blocks.

A system for processing data may include a plurality of processors and a management function, where the management function is configured to assign data to each processor. Each processor may include an input data bus, a plurality of interconnected reprogrammable compute blocks, and an output data bus. The interconnected reprogrammable operand blocks may include: an input data bus of a first reprogrammable operand block, the input data bus of the first reprogrammable operand block coupled to an output data bus or the input data bus of a second reprogrammable operand block; an output data bus of a first reprogrammable operand block, the output data bus of the first reprogrammable operand block coupled to an input data bus or an output data bus of a third reprogrammable operand block; and one or more transformation functions that take input data from a previous data bus, perform one or more transformations on the input data, and output the transformed input data via an output data bus.

The management function may also be configured to reconfigure the plurality of interconnected reprogrammable operator blocks. The management function may include a memory storage instruction to create a plurality of interconnected reprogrammable computer blocks of the plurality of processors. The claimed system may also include at least one front-end processor having a different architecture than the plurality of processors. The at least one front-end processor may comprise a general-purpose processor. The management functions may include: a secure interface configured to receive a configuration change; and a non-secure interface configured to distribute data to one or more of the plurality of processors.

Fig. 15 is a block diagram illustrating components capable of reading instructions from a machine-readable or computer-readable medium (e.g., a machine-readable storage medium) and performing any one or more of the methodologies discussed herein, according to some example embodiments. In particular, fig. 15 shows a diagrammatic representation of hardware resources 1500, including one or more processors (or processor cores) 1510, one or more memory/storage devices 1520, and one or more communication resources 1530, each of which are communicatively coupled via a bus 1540.

Processor 1510 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP) such as a baseband processor, an Application Specific Integrated Circuit (ASIC), a Radio Frequency Integrated Circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, processor 1512 and processor 1514. Memory/storage 1520 may include main memory, disk storage, or any suitable combination thereof.

The communication resources 1530 may include interconnection and/or network interface components or other suitable devices to communicate with one or more peripheral devices 1504 and/or one or more databases 1506 via a network 1508. For example, communication resources 1530 may include wired communication components (e.g., for coupling via a Universal Serial Bus (USB)), cellular communication components, Near Field Communication (NFC) components, a wireless communication component, and/or a wireless communication component,

Components (e.g. low power consumption)

)、

Components and other communication components.

The instructions 1550 may include software, programs, applications, applets, apps, or other executable code for causing at least any of the processors 1510 to perform any one or more of the methods discussed herein. The instructions 1550 may reside, completely or partially, within at least one of: the processor 1510 (e.g., within a cache memory of the processor), the memory/storage 1520, or any suitable combination thereof. Further, any portion of instructions 1550 may be transferred to hardware resources 1500 from any combination of peripheral devices 1504 and/or databases 1506. Thus, the memory of the processor 1510, the memory/storage 1520, the peripheral devices 1504, and the database 1506 are examples of computer-readable and machine-readable media.

As used herein, the term "circuitry" may refer to, belong to, or comprise an Application Specific Integrated Circuit (ASIC), an electronic circuit, a (shared, dedicated, or combined) processor and/or a (shared, dedicated, or combined) memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality. In some embodiments, the circuitry may be implemented in, or functions associated with, one or more software or firmware modules. In some embodiments, the circuitry may comprise logic operable, at least in part, in hardware.

Embodiments and implementations of the systems and methods described herein may include various operations that may be embodied in machine-executable instructions to be executed by a computer system. A computer system may include one or more general-purpose or special-purpose computers (or other electronic devices). The computer system may include hardware components that include specific logic for performing operations or may include a combination of hardware, software, and/or firmware.

The computer system and the computers in the computer system may be connected via a network. Suitable networks for configuration and/or use as described herein include one or more local area networks, wide area networks, metropolitan area networks, and/or the internet or IP networks, such as the world wide web, private internet, secure internet, value added networks, virtual private networks, extranets, intranets, or even standalone machines that communicate with other machines through physical transmission of media. In particular, a suitable network may be formed of part or all of two or more other networks, including networks using different hardware and network communication technologies.

One suitable network includes a server and one or more clients; other suitable networks may include other combinations of servers, clients, and/or peer nodes, and a given computer system may act as a client and as a server. Each network includes at least two computers or computer systems, such as servers and/or clients. The computer system may include a workstation, laptop, disconnectable mobile computer, server, mainframe, cluster, so-called "network computer" or "thin client", tablet, smartphone, personal digital assistant or other handheld computing device, an "intelligent" consumer electronic device or appliance, a medical device, or a combination thereof.

Suitable networks may include communications or networking software, such as from

And software available from other vendors and can operate over twisted pair, coaxial or fiber optic cables, telephone lines, radio waves, satellites, microwave repeaters, modulated AC power lines, physical media transmissions, and/or other data transmission "lines" known to those skilled in the art using TCP/IP, SPX, IPX, and other protocols. The network may include smaller networks and/or be connectable to other networks through a gateway or similar mechanism.

The various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, magnetic or optical cards, solid-state memory devices, non-transitory computer-readable storage media, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. In the case of program code execution on programmable computers, the computing device may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The volatile and non-volatile memory and/or storage elements can be RAM, EPROM, flash drives, optical drives, magnetic hard drives, or another medium for storing electronic data. One or more programs that may implement or utilize the various techniques described herein may use an Application Programming Interface (API), reusable controls, and the like. Such programs may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.

Each computer system includes one or more processors and/or memories; the computer system may also include various input devices and/or output devices. The processor may comprise a general purpose device, such as

Or other "off-the-shelf" microprocessor. The processor may comprise a special purpose processing device such as an ASIC, SoC, SiP, FPGA, PAL, PLA, FPLA, PLD, or other custom or programmable device. The memory may include static RAM, dynamic RAM, flash memory, one or more flip-flops, ROM, CD-ROM, DVD, magnetic disk, magnetic tape, or magnetic, optical, or other computer storage media. Input devices may include a keyboard, mouse, touch screen, light pen, tablet, microphone, sensor, or other hardware with firmware and/or software. Output devices may include a monitor or other display, a printer, a voice or text synthesizer, a switch, a signal line, or other hardware with firmware and/or software.

It should be appreciated that many of the functional units described in this specification can be implemented as one or more components, and these terms are used to more particularly emphasize their implementation independence. For example, a component may be implemented as a hardware circuit comprising custom Very Large Scale Integration (VLSI) circuits or gate arrays or off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A component may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

Components may also be implemented in software for execution by various types of processors. An identified component of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified component need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the component and achieve the stated purpose for the component.

Indeed, a component of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within components, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network. The components may be passive or active, including agents operable to perform desired functions.

Several aspects of the described embodiments will be illustrated as software modules or components. As used herein, a software module or component may include any type of computer instruction or computer executable code located within a memory device. For example, a software module may include one or more physical or logical blocks of computer instructions, which may be organized as a routine, program, object, component, data structure, etc., that performs one or more tasks or implements particular data types. It will be appreciated that software modules may be implemented in hardware and/or firmware instead of or in addition to software. One or more of the functional modules described herein may be separated into sub-modules and/or combined into a single or fewer number of modules.

In some embodiments, a particular software module may include different instructions stored in different locations of the memory device, different memory devices, or different computers, which collectively implement the described functionality of the module. Indeed, a module may comprise a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across several memory devices. Some embodiments may be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, software modules may be located in local memory storage devices and/or remote memory storage devices. In addition, data bound or presented together in a database record may reside in the same memory device or across several memory devices, and may be linked together in record fields in the database across a network.

Reference throughout this specification to "an example" means that a particular feature, structure, or characteristic described in connection with the example is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "in an example" in various places throughout this specification are not necessarily all referring to the same embodiment.

As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on their presentation in a common group without indications to the contrary. Furthermore, various embodiments and examples of the invention may be referenced herein along with alternatives for the various components thereof. It should be understood that such embodiments, examples, and alternatives are not to be construed as actual equivalents to each other, but are to be considered as independent and autonomous manifestations of the invention.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of materials, frequencies, sizes, lengths, widths, shapes, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

It should be appreciated that the system described herein includes descriptions of specific embodiments. The embodiments may be combined into a single system, partially combined into other systems, split into multiple systems, or otherwise divided or combined. Additionally, it is contemplated that parameters/attributes/aspects/etc. of one embodiment may be used in another embodiment. Parameters/properties/aspects/etc. are described in one or more embodiments for clarity, and it should be recognized that parameters/properties/aspects/etc. may be combined with or substituted for parameters/properties/aspects/etc. of another embodiment unless explicitly stated herein.

Although the foregoing has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be made without departing from the principles thereof. It should be noted that there are many alternative ways of implementing the processes and apparatuses described herein. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

It will be appreciated by those skilled in the art that many changes could be made to the details of the above-described embodiments without departing from the underlying principles of the invention. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. A method of configuring a processor, the method comprising:

receiving a program configured to operate as software;

determining a first portion of the program to run in hardware and a second portion of the program to run as software;

determining a plurality of interconnected reprogrammable operand blocks based on the first portion, the plurality of interconnected reprogrammable operand blocks including one or more transformation functions that take input data from a previous data bus, perform one or more transformations on the input data, and output transformed input data via an output data bus;

configuring the plurality of interconnected reprogrammable computer blocks for execution on one or more reprogrammable processors;

executing, via one or more front-end processors, the second portion;

sending first data from the one or more front-end processors to the one or more reprogrammable processors;

executing the first portion via the one or more reprogrammable processors;

sending second data from the one or more reprogrammable processors to the one or more front-end processors; and

result data is determined based on the first data and the second data.

2. The method of claim 1, wherein determining a first portion further comprises determining the first portion of the program to run in hardware based on computational complexity.

3. The method of claim 1, wherein the result data is a result of execution of the program.

4. The method of claim 1, wherein the second data is derived from the first data based on the one or more transformations.

5. The method of claim 1, wherein the result data is derived from the second data and the second data is derived from the first data.

6. The method of claim 1, wherein the one or more reprogrammable processors are field programmable gate arrays.

7. The method of claim 1, wherein sending the first data from the one or more front-end processors to the one or more reprogrammable processors further comprises sending the first data from the one or more front-end processors to the one or more reprogrammable processors via an expansion bus.

8. The method of claim 1, wherein receiving a program configured to run as software further comprises compiling the program into executable software code, a hardware configuration, and communication code for transferring data between the executable software code and the hardware configuration.

9. A conveyor belt processor comprising:

an input data bus;

a plurality of interconnected reprogrammable operand blocks comprising:

an input data bus of a first reprogrammable operand block, the input data bus of the first reprogrammable operand block coupled to an output data bus or the input data bus of a second reprogrammable operand block;

an output data bus of a first reprogrammable operand block, the output data bus of the first reprogrammable operand block coupled to an input data bus or an output data bus of a third reprogrammable operand block;

one or more transformation functions that take input data from a previous data bus, perform one or more transformations on the input data, and output transformed input data via an output data bus; and

the output data bus.

10. The conveyor belt processor of claim 9, wherein the bus widths of the plurality of interconnected reprogrammable operand blocks are not the same.

11. The conveyor belt processor of claim 9, wherein the output of a subsequent block is the input of a previous block.

12. The conveyor belt processor of claim 9, wherein the second reprogrammable algorithm block and the third reprogrammable algorithm block are the same.

13. The conveyor belt processor of claim 9, wherein the third reprogrammable operator block precedes the first reprogrammable operator block in execution order.

14. The conveyor belt processor of claim 9, further comprising a programming interface configured to receive instructions for creating the plurality of interconnected reprogrammable algorithm blocks.

15. A system for processing data, the system comprising:

a plurality of processors, each processor comprising:

an input data bus;

a plurality of interconnected reprogrammable operator blocks, the plurality of interconnected reprogrammable operator blocks comprising:

an output data bus of a first reprogrammable algorithm block, the output data bus of the first reprogrammable algorithm block coupled to an input data bus or an output data bus of a third reprogrammable algorithm block; and

the output data bus; and

a management function, wherein the management function is configured to distribute data to each processor.

16. The system of claim 15, wherein the management function is further configured to reconfigure the plurality of interconnected reprogrammable operator blocks.

17. The system of claim 15, wherein the management function comprises memory storage instructions to create the plurality of interconnected reprogrammable operand blocks of the plurality of processors.

18. The system of claim 15, further comprising at least one front-end processor having a different architecture than the plurality of processors.

19. The system of claim 18, wherein the at least one front-end processor comprises a general purpose processor.

20. The system of claim 15, wherein the management function comprises:

a secure interface configured to receive a configuration change; and

a non-secure interface configured to distribute data to one or more of the plurality of processors.