WO2019000362A1 - Technologies for rapid configuration of field-programmable gate arrays - Google Patents

Technologies for rapid configuration of field-programmable gate arrays Download PDF

Info

Publication number
WO2019000362A1
WO2019000362A1 PCT/CN2017/090995 CN2017090995W WO2019000362A1 WO 2019000362 A1 WO2019000362 A1 WO 2019000362A1 CN 2017090995 W CN2017090995 W CN 2017090995W WO 2019000362 A1 WO2019000362 A1 WO 2019000362A1
Authority
WO
WIPO (PCT)
Prior art keywords
bitstream
fpga
residual
new
codestream
Prior art date
Application number
PCT/CN2017/090995
Other languages
French (fr)
Inventor
Yong Jiang
Gaowei Xu
Yang KONG
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2017/090995 priority Critical patent/WO2019000362A1/en
Publication of WO2019000362A1 publication Critical patent/WO2019000362A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture

Definitions

  • FIG. 1 is a simplified block diagram of at least one embodiment of a compute device for rapid reconfiguration of an FPGA device
  • FIG. 2 is a simplified block diagram of at least one embodiment of an environment that may be established by the compute device of FIG. 1;
  • FIGS. 3 and 4 are simplified flow diagrams of at least one embodiment of a method for rapid reconfiguration of an FPGA device that may be performed by the compute device of FIGS. 1 and 2;
  • references in the specification to “one embodiment, ” “an embodiment, ” “an illustrative embodiment, ” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • items included in a list in the form of “at least one A, B, and C” can mean (A) ; (B) ; (C) ; (Aand B);(Aand C) ; (B and C) ; or (A, B, and C) .
  • items listed in the form of “at least one of A, B, or C” can mean (A) ; (B) ; (C) ; (Aand B) ; (Aand C) ; (B and C) ; or (A, B, and C) .
  • the disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors.
  • a machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device) .
  • a compute device 106 includes a field programmable gate array (FPGA) 120.
  • the compute device 106 may upload a bitstream to the FPGA 120 in order to run a program on the FPGA 120.
  • the compute device 106 may then upload a new bitstream, which may be similar to the present bitstream of the FPGA 120.
  • the compute device 106 may determine the differences between the new bitstream and the present bitstream.
  • the compute device 106 may encode the differences into a codestream, which may be of a much smaller size than the bitstream.
  • the compute device 106 provides the codestream to the FPGA 120, which may then update the configuration of the FPGA 120 based on the differences between the new bitstream and the present bitstream, which may require less writing and, therefore, less time as compared to rewriting the entire configuration of the FPGA 120.
  • the compute device 106 may be embodied as any type of computation or computer device capable of performing the functions described herein, including, without limitation, a computer, a server, a rack-mounted server, a workstation, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a mobile computing device, a wearable computing device, a network appliance, a web appliance, a distributed computing system, a processor-based system, and/or a consumer electronic device.
  • the compute device 106 illustratively includes a central processing unit (CPU) 110, a main memory 112, an input/output (I/O) subsystem 114, communication circuitry 116, the FPGA 120, and data storage 122.
  • CPU central processing unit
  • main memory 112 main memory
  • I/O subsystem 114 input/output subsystem
  • communication circuitry 116 the FPGA 120
  • data storage 122 data storage 122.
  • the compute device 106 may include other or additional components, such as those commonly found in a computer (e.g., a display, various peripheral devices, etc. ) .
  • one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component.
  • the main memory 112, or portions thereof, may be incorporated in the CPU 110.
  • the CPU 110 may be embodied as any type of processor capable of performing the functions described herein.
  • the CPU 110 may be embodied as a single or multi-core processor (s) , a microcontroller, or other processor or processing/controlling circuit.
  • the CPU 110 may be embodied as, include, or be coupled to a field programmable gate array (FPGA) , an application specific integrated circuit (ASIC) , a graphics processing unit (GPU) , reconfigurable hardware or hardware circuitry, or other specialized hardware to facilitate performance of the functions described herein.
  • the main memory 112 may be embodied as any type of volatile (e.g., dynamic random access memory (DRAM) , etc.
  • DRAM dynamic random access memory
  • main memory 112 may be integrated into the CPU 110.
  • the main memory 112 may store various data and software used during operation of the compute device 106 such as packet data, operating systems, applications, programs, libraries, and drivers.
  • the main memory 112 is communicatively coupled to the CPU 110 via the I/O subsystem 114, which may be embodied as circuitry and/or components to facilitate input/output operations with the CPU 110, the main memory 112, and other components of the host compute device 102.
  • the I/O subsystem 114 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, sensor hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc. ) and/or other components and subsystems to facilitate the input/output operations.
  • the I/O subsystem 114 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the CPU 110, the main memory 112, and other components of the compute device 106, on a single integrated circuit chip.
  • SoC system-on-a-chip
  • the compute device 106 includes the FPGA 120.
  • the FPGA 120 may be embodied as any hardware device (e.g., a co-processor, an FPGA, and ASIC, or circuitry) capable of performing functions that include rapid reconfiguration of FPGAs. More specifically, the FPGA 120 is any device capable of performing the rapid FPGA reconfiguration scheme described with respect to FIG. 6 below.
  • the FPGA 120 contains an array of programmable logic blocks with a hierarchy of reconfigurable interconnects that allow that logic blocks to be wired together.
  • the illustrative programmable logic blocks may be configured to perform complex combinational functions or merely simple logic gates like AND, OR, XOR, NAND, etc.
  • the specific configuration of an FPGA 120 includes the settings for the logic gates and interconnects.
  • a configuration of the FPGA 120 may be expressed as a bitstream with each bit of the bitstream corresponding to a particular setting, register, or bit value of the FPGA 120.
  • the communication circuitry 116 may be embodied as any communication circuit, device, or collection thereof, capable of enabling communications over a network between the compute device 106 other components such as other compute devices 106.
  • the communication circuitry 116 may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., Ethernet, WiMAX, etc. ) to effect such communication.
  • the illustrative communication circuitry 116 includes a network interface controller (NIC) 118, which may also be referred to as a host fabric interface (HFI) .
  • NIC network interface controller
  • HFI host fabric interface
  • the NIC 118 may be embodied as one or more add-in-boards, daughtercards, network interface cards, controller chips, chipsets, or other devices that may be used by the compute device 106 toother components such as other compute devices 106.
  • the NIC 118 may be embodied as part of a system-on-a-chip (SoC) that includes one or more processors, or included on a multichip package that also contains one or more processors.
  • SoC system-on-a-chip
  • the NIC 118 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 118.
  • the local processor of the NIC 118 may be capable of performing one or more of the functions of the CPU 110 described herein.
  • the local memory of the NIC 118 may be integrated into one or more components of the compute device 106 at the board level, socket level, chip level, and/or other levels.
  • the compute device 106 may additionally include a data storage device 122, which may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices.
  • the data storage device 122 may include a system partition that stores data and firmware code for the compute device 106.
  • the data storage device 122 may also include an operating system partition that stores data files and executables for an operating system of the compute device 106.
  • the compute device 106 may include a display 124.
  • the display 124 may be embodied as, or otherwise use, any suitable display technology including, for example, a liquid crystal display (LCD) , a light emitting diode (LED) display, a cathode ray tube (CRT) display, a plasma display, and/or other display usable in a compute device.
  • the display may include a touchscreen sensor that uses any suitable touchscreen input technology to detect the user’s tactile selection of information displayed on the display including, but not limited to, resistive touchscreen sensors, capacitive touchscreen sensors, surface acoustic wave (SAW) touchscreen sensors, infrared touchscreen sensors, optical imaging touchscreen sensors, acoustic touchscreen sensors, and/or other type of touchscreen sensors.
  • SAW surface acoustic wave
  • the compute device 106 may include one or more peripheral devices 126.
  • peripheral devices 126 may include any type of peripheral device commonly found in a compute device such as speakers, a mouse, a keyboard, and/or other input/output devices, interface devices, and/or other peripheral devices.
  • the compute device 106 may include other components, sub-components and/or devices commonly found in a compute device, which are not illustrated in FIG. 1 for clarity of the description.
  • the compute device 106 may establish an environment 200 during operation.
  • the illustrative environment 200 includes an FPGA manager 220, a bitstream manager 230, a residual manager 240, and a codestream generation manager 250.
  • the illustrative environment 200 also includes the FPGA 120, described above with respect to FIG. 1.
  • the FPGA 120 includes a bitstream receiver 260, a bitstream decoder 270, and a bitstream loader 280.
  • Each of the components of the environment 200 may be embodied as hardware, firmware, software, or a combination thereof.
  • one or more of the components of the environment 200 may be embodied as circuitry or collection of electrical devices (e.g., FPGA manager circuitry 220, bitstream manager circuitry 230, residual manager circuitry 240, codestream generation manager circuitry 250, etc. ) . It should be appreciated that, in such embodiments, one or more of the FPGA manager circuitry 220, bitstream manager circuitry 230, residual manager circuitry 240, and codestream generation manager circuitry 250 may form a portion of one or more of the CPU 110, main memory 112, I/O subsystem 114, communication circuitry 116 and/or other components of the compute device 106.
  • one or more of the illustrative components may form a portion of another component and/or one or more of the illustrative components may be independent of one another.
  • one or more of the components of the environment 200 may be embodied as virtualized hardware components or emulated architecture, which may be established and maintained by the CPU 110 or other components of the compute device 106.
  • the compute device 106 also includes stream data 202, residual codestream data 204, and compression codestream data 206.
  • the stream data 202 may be embodied as any data that is indicative of one or more bitstreams intended for or transmitted to an FPGA during reconfiguration.
  • the stream data 202 in the illustrative embodiment, also includes data indicative of codestreams that are generated using any bitstreams intended for an FPGA.
  • the residual codestream data 204 may be embodied as any data that is indicative of differences between two bitstreams, such as the differences between a new bitstream and a bitstream of a present configuration of the FPGA 120.
  • the residual codestream data 204 may also be embodied as any data that is generated by one or more residual encoding algorithms that may be employed to generate a residual codestream, a residual bitstream, or the like.
  • the compression codestream data 206 may be embodied as any data usable to encode FPGA configurations into a codestream using techniques that do not require generation of a residual bitstream.
  • the compression codestream data 206 may be embodied as any data that is generated by one or more compression algorithms.
  • the stream data 202, residual codestream data 204, and compression codestream data 206 may be accessed by the various components and/or sub-components of the compute device 106.
  • the FPGA manager 220 which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to facilitate management of connected FPGA devices. More specifically, the FPGA manager 220 may be configured to load updated bitstreams into connected FPGA devices using communication techniques such as streaming. The FPGA manager 220 may schedule when certain configurations of the FPGA 120 are loaded onto the FPGA 120, how long a particular configuration may be executed by the FPGA 120, an order of execution of different configurations of the FPGA 120, provide input to the FPGA 120, receive output from the FPGA 120, etc. In some embodiments, the FPGA manager 220 may perform a partial reconfiguration of the FPGA 120 by providing a bitstream corresponding to a certain subset of the settings, registers, or bit values of the FPGA 120.
  • bitstream manager 230 which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to manage bitstreams that may be loaded onto the FPGA 120.
  • bitstream refers to a sequence of bits (or binary values) that can be serially transmitted from one compute device to another.
  • a bitstream may refer to the sequence of bits before being transmitted to the FPGA 120 or after the values of the bitstream are loaded into the FPGA 120, and the bitstream is not limited to referring to the sequence of bits when the sequence is actually being transmitted.
  • a bitstream may either be a complete bitstream, which specifies the setting for every logic block and reconfigurable interconnect of the FPGA 120, or a partial bitstream, which specifies the setting for every logic block and reconfigurable interconnect of only a portion of the FPGA 120. It should be appreciated that the techniques and algorithms disclosed herein may be applied to a complete reconfiguration of the FPGA 120 or to a partial reconfiguration of the FPGA 120.
  • the bitstream manager 230 may generate bitstreams, such as by compiling a program made up of instructions that are written using a particular language (e.g., a programming language or a hardware-definition language) .
  • the bitstream manager 230 is configured to convert these instructions into a stream or series of bits (or bytes, or hexadecimal values, or the like) .
  • the bitstream manager 230 may also generate a bitstream based on the current configuration for the FPGA 120.
  • the bitstream manager 230 when compiling a program to generate a bitstream, the bitstream manager 230 may be configured to generate a bitstream from the program in such a way that the generated bitstream has fewer differences with a known bitstream, such as a bitstream currently loaded on the FPGA 120, than might otherwise be the case.
  • the bitstream manager 230 may compile a segment of the program and determine a configuration of a portion of the FPGA 120 such as the settings associated with a set of logic blocks near each other.
  • the bitstream manager 230 may compare the configuration of that set of logic blocks with the configuration of the FPGA 120. If the settings of a portion of the current configuration of the FPGA 120 is similar or identical to the settings of the set of logic blocks, then the bitstream manager 230 may place that set of logic blocks in the same physical location as the current configuration of the FPGA 120. It should be appreciated that such an approach may reduce or minimize the differences in the bitstream associated with the current configuration of the FPGA 120 and the bitstream that the bitstream manager 230. In some embodiments, the bitstream manager 230 may generate several versions of a bitstream for the same program, such as by generating a version based on a comparison for each of several other bitstreams.
  • the version of the bitstream of the program may be selected based on the which of the several other bitstreams is currently loaded on the FPGA 120 or based on which of the several versions has the fewest differences with the current bitstream loaded on the FPGA 120. Additionally or alternatively, the bitstream manager 230 may generate the bitstream of a program shortly before loading the bitstream onto the FPGA 120 by generating the bitstream of the program to have relatively few differences with the bitstream currently loaded on the FPGA 120.
  • the residual manager 240 which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to generate residual bitstreams by comparing two bitstreams for the FPGA 120.
  • the residual manager 240 may be configured to receive an updated bitstream intended for an FPGA device, identify the FPGA device from a group of connected devices, retrieve the current bitstream for the current configuration for the FPGA device.
  • the residual manager may be further configured to generate a residual bitstream using the current bitstream and the updated bitstream.
  • the residual bitstream indicates the differences between the two bitstreams being compared.
  • the current bitstream and the updated bitstream may be parsed into a series of values, with each value being assigned an index, counter, or other referential indicator to indicate the position of each value in the series.
  • the residual manager 240 compares the value at the corresponding index for each bitstream. For example, the residual manager 240 compares the value at index 0 for both the current bitstream and the updated bitstream. If the values are identical, a zero residual value may be generated. In other words, the zero residual value gives an indication that values in both bitstreams are the same at the corresponding index. If the values are not identical, a non-zero residual value may be generated.
  • the non-zero residual value may indicate the differences between the value of the current bitstream and the value of the updated bitstream (such as an exclusive or (XOR) operation between the values) or may indicate the value of the updated bitstream.
  • the residual bitstream may provide an indication that a zero residual value at a certain index may indicate that there is a difference between the current bitstream and the updated bitstream and that zero is the updated value.
  • the values at each index of the bitstreams may be one or more bits and may be, for example, represented in hexadecimal. Where the values in each configuration are in hexadecimal form, the residual may also be generated as a hexadecimal value. Accordingly, a zero residual value may be represented as 0x00, and a non-zero residual value may be represented as, for example, 0x7f, 0x12, or the like.
  • the codestream generation manager 250 which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to generate a codestream from bitstreams that are being generated and manipulated by the bitstream manager 230 and the residual manager 240. For example, the residual manager 240 will generate a residual bitstream that will be used to generate a codestream that can then be loaded into the FPGA 120 as the updated configuration.
  • the codestream generation manager 250 may be configured to generate codestreams using more than one algorithm. For example, the codestream generation manager 250 may perform a compression algorithm such as run-length encoding on the bitstream to generate a compression codestream.
  • the compression codestream can be decompressed to generate the bitstream without any additional information such as information indicating differences with another bitstream.
  • a compression codestream generated by the codestream generation manager 250 with use of a run-length encoding may be embodied as a series of pairs of values, with the first value of each pair indicating a bitstream value of the bitstream and the second value of each pair indicating the number of consecutive values of the bitstream that are the first value.
  • the codestream generation manager 250 may also encode the bitstream into a residual codestream using the residual bitstream, such as by indicating which locations of the updated bitstream are different from the current bitstream in a compressed form. In the illustrative embodiment, as described in more detail below in regard to FIG.
  • a residual codestream may be embodied as a series of pairs of values or triples of values.
  • Each pair of values includes a first value (such as a 0) indicating that there is not a difference between the current bitstream and the updated bitstream and a second value indicating the number of consecutive locations that are not different.
  • Each triple of values includes a first value (such as 1) indicating that there is a difference between the current bitstream and the updated bitstream, a second value indicating a value of the updated bitstream, and a third value indicating the number of consecutive values of the updated bitstream that have the second value.
  • each of the compression codestream and the residual codestream may include an indication of which type of encoding is used, such as a single bit at the beginning of the codestream.
  • the codestream generation manager 250 may generate both a compression codestream and a residual codestream of the updated bitstream, and provide the smaller of the compression codestream and the residual codestream to the FPGA manager 220.
  • the codesteam generation manager 250 may generate one codestream based on, e.g., an indication from the user or from another component of the compute device 106 that the updated bitstream is similar to the current bitstream (in which case the residual codestream may be used) or that the updated bitstream is not similar to the current bitstream (in which case the compression codestream may be used) .
  • whether the residual codestream is used may be based on whether there is a residual bitstream stored in the stream data 202 that was generated based on the current bitstream of the FPGA 120. If there is such a residual bitstream, then the residual codestream may be used.
  • the FPGA 120 is configured to acquire a bitstream and load it as a new or updated configuration.
  • the FPGA 120 may accept input from another component of the compute device 106 such as the FPGA manager 220, operate the logic gates of the current configuration of the FPGA 120, and provide output to components of the compute device 106 such as the FPGA manager 220.
  • the bitstream receiver 260 is configured to receive a bitstream from another component of the compute device 106 such as the FPGA manager 220.
  • the bitstream receiver 260 of the FPGA 120 is configured to receive an encoded bitstream, such as a residual codestream of an updated bitstream or a compressed codestream of the updated bitstream.
  • the bitstream decoder 270 of the FPGA 120 is configured to decode the encoded bitstream.
  • the bitstream decoder 270 may decode the encoded bitstream by first determining what encoding is used and then decoding the bitstream.
  • the bitstream loader 280 is configured to update the configuration of the FPGA 120 by writing values of the bitstream to the corresponding settings, registers, or bit values of the FPGA 120 based on the decoded bitstream. If the encoded bitstream is a residual codestream, the bitstream loader 280 may only write values of the bitstream that are indicated to be different from the current bitstream.
  • each of the FPGA manager 220, bitstream manager 230, residual manager 240, and codestream generation manager 250 may be separately embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof.
  • the FPGA manager 220 may be embodied as a hardware component
  • the bitstream manager 230 and the codestream generation manager 250 may be embodied as virtualized hardware components or as some other combination of hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof.
  • the compute device 106 may execute a method 300 for rapid reconfiguration of an FPGA device. It should be appreciated that, in some embodiments, the operations of the method 300 may be performed by one or more components of the environment 200 of the compute device 106 as shown in FIG. 2.
  • the method 300 begins with block 302, in which the compute device 106 determines whether to load a new program (e.g., an updated bitstream) into the FPGA 120.
  • a new program e.g., an updated bitstream
  • the compute device 106 may determine to perform the rapid FPGA reconfiguration in response to determining that the FPGA manager 220 has received a new or updated bitstream or program for the FPGA 120, in response to a determination that a setting stored in a configuration file in the data storage device 122 indicates to perform the rapid FPGA reconfiguration, and/or as a function of other criteria. If the compute device 106 is not to load a new program, the method 300 loops back to block 302. If the compute device 106 is to load a new program, the method 300 advances to block 304, in which the compute device 106 selects a new program to load into the FPGA 120. In the illustrative embodiment, the program selection may be based on a variety of criteria.
  • the compute device 106 may have received an explicit instruction (e.g., from an operator or other compute device) that a new program should be loaded. Additionally or alternatively, the compute device 106 may have a schedule for when certain programs should be run on the FPGA 120 or other criteria indicating when certain programs should be run.
  • an explicit instruction e.g., from an operator or other compute device
  • the compute device 106 may have a schedule for when certain programs should be run on the FPGA 120 or other criteria indicating when certain programs should be run.
  • the compute device 106 acquires the bitstream for the new program.
  • the bitstream of the new program may generated by the compute device 106 or may be stored in the compute device 106.
  • the compute device 106 may generate the bitstream of the new program so as to minimize or reduce the differences with a current bitstream loaded on the FPGA 120.
  • the compute device 106 may have already generated a bitstream with relatively few differences to the current bitstream and such a bitstream may be stored on the compute device 106.
  • the compute device 106 may have generated different versions of the same program, such as versions which have a low number of differences as compared to various known bitstreams.
  • the compute device 106 may choose a version of the program which is known to have a low number of differences as compared to the current bitstream of the FPGA 120, or the compute device 106 may determine which of the several versions of the program have the fewest differences as compared to the current bitstream and choose that version.
  • the compute device 106 determines an encoding algorithm to use to encode the new bitstream before transmitting the new bitstream to the FPGA 106.
  • the compute device 106 may compare the length of a residual codestream and the length of a compression codestream and select the codestream that is shorter. To do so, the compute device 106 may generate a residual codestream of the new bitstream and the current bitstream in block 310.
  • the compute device 106 may first calculate the residual of the new bitstream and the current bitstream in block 312. As discussed above in more detail, the residual bitstream indicates the differences between the two bitstreams being compared. In the illustrative embodiment, as described in more detail below in regard to FIG.
  • the residual bitstream may be embodied as the bit-wise XOR operation of the two bitstreams being compared.
  • the residual codestream is a compressed representation of the residual bitstream.
  • a residual codestream may be embodied as a series of pairs of values or triples of values. Each pair of values includes a first value (such as a 0) indicating that there is not a difference between the current bitstream and the updated bitstream and a second value indicating the number of consecutive locations that are not different.
  • Each triple of values includes a first value (such as 1) indicating that there is a difference between the current bitstream and the updated bitstream, a second value indicating a value of the updated bitstream, and a third value indicating the number of consecutive values of the updated bitstream that have the second value.
  • the compute device 106 may generate a compression codestream of the new bitstream using.
  • the compression codestream can be decompressed to generate the bitstream without any additional information such as information indicating differences with another bitstream.
  • a compression codestream generated by the codestream generation manager 250 with use of a run-length encoding may be embodied as a series of pairs of values, with the first value of each pair indicating a bitstream value of the bitstream and the second value of each pair indicating the number of consecutive values of the bitstream that are the first value.
  • the compute device 106 may determine an encoding algorithm based on a factor other than the lengths of the residual codestream and the compression codestream. For example, the compute device 106 may receive an indication from a user of the compute device 106 indicating which encoding algorithm to use, or the compute device 106 may determine to use residual encoding based on the presence of a stored bitstream corresponding to the new program that is known to have a small number of differences with the current bitstream.
  • the method 300 proceeds to block 316 in FIG. 4, in which, if the compute device 106 determined in block 308 not to use residual encoding, the method 300 proceeds to block 324, in which the compute device 106 generates a compression codestream of the new bitstream. If the compute device 106 determined in block 308 to use residual encoding, the method 300 proceeds to block 318, in which the compute device 106 generates a residual codestream of the new bitstream and the current bitstream, as described above in regard to block 310. Of course, in embodiments in which the compute device 106 generated the residual codestream as part of determining whether or not to use residual encoding, the compute device 106 need not regenerate the residual codestream in block 318.
  • the compute device 106 provides an indication as part of the residual codestream that residual encoding was used.
  • the compute device 106 may add and clear a one-bit flag at the beginning of the codestream.
  • the method 300 proceeds to block 330, in which the compute device 106 loads the new codestream into the FPGA 120.
  • the method 300 proceeds to block 324, in which the compute device 106 generates a compression codestream of the new bitstream, as described above in regard to block 314.
  • the compute device 106 need not regenerate the compression codestream in block 324.
  • the compute device 106 provides an indication as part of the compression codestream that compression encoding was used.
  • the compute device 106 may add and set a one-bit flag at the beginning of the codestream.
  • the compute device 106 loads the codestream into the FPGA 120.
  • the compute device 106 includes the one-bit flag in the codestream to denote the encoding type (e.g., residual encoding or compressive encoding) .
  • the method 300 may then loop back to block 302 in FIG. 3.
  • the compute device 106 proceeds to execute a method 500 for generating a residual codestream from a residual bitstream.
  • the residual bitstream may be a series of values that are each denoted by an index.
  • An integer index i from 0 to some non-zero value n may be used.
  • the compute device 106 retrieves the residual value at the particular location in the residual bitstream indicated by the index.
  • the method advances to block 506, in which the compute device 106 determines whether the residual value at the index i equals a zero residual value.
  • the method advances to block 510, in which the compute device 106 determines a number of consecutive residual values that are of value 0x00.
  • the method advances to block 512, in which the compute device 106 encodes a ‘skip’ command in the residual codestream.
  • the skip command indicates that the values at that index in the current bitstream and the updated bitstream are identical and so there is no need to encode the updated bitstream value. Therefore, the value at that index in the updated configuration bitstream may simply be skipped such that the current value at that index (i.e., the value from the current configuration) will be retained on the FPGA.
  • the number of consecutive zero values to be skipped is also encoded at block 514.
  • the method advances to block 516, in which the compute device 106 increments the index i for the residual bitstream by the number of values skipped.
  • the method advances to block 528, in which the compute device 106 determines whether the residual bitstream has more values.
  • the method advances to block 518.
  • the residual value being non-zero indicates that, at this index location, the value in the current configuration bitstream differs from that in the updated configuration bitstream.
  • the compute device 106 determines the number of consecutive non-zero residual values corresponding to the same new bitstream value. It should be appreciated that different consecutive non-zero residual values may correspond to the same consecutive value in the new bitstream, such as when the consecutive values of the new bitstream are the same but the consecutive values of the current bitstream are not the same.
  • the compute device 106 encodes an overwrite command in the codestream.
  • the method advances to block 522 in which the compute device 106 specifies the value from the new bitstream at index i into the residual codestream. Additionally, the compute device 106 specifies, at block 524, the number of consecutive values to be overwritten. The method then advances to block 526, in which the compute device 106 increments the index i by the number of values overwritten as specified in block 524.
  • the method 500 then advances to block 528, in which the compute device 106 determines whether there are more values in the residual bitstream. If there are more values in the residual bitstream, the method 500 loops back to block 504. If there are not more values in the residual bitstream, then encoding the residual codestream is complete.
  • the FPGA 120 may execute a method 600 for performing a codestream write process.
  • the FPGA 120 is configured to determine whether to perform the codestream write process at block 602. If there is a determination to perform the codestream write process, the method advances to block 604, in which the FPGA 120 receives a codestream to be written to the FPGA 120.
  • the FPGA 120 determines the encoding that was used to generate the codestream. For example, the FPGA 120 may parse the first bit of the codestream. As described above with respect to FIG. 4, the first bit of the codestream may represent the type of encoding that was used to generate the codestream (e.g., compressive encoding, residual encoding, or the like) . In block 608, if a residual encoding is used, the method 600 advances to block 610, in which the FPGA 120 decodes the codestream using a residual decoding algorithm.
  • the FPGA 120 writes the values of the appropriate logic blocks and interconnects which are different from the new bitstream based on the decoded residual bitstream. It should be appreciated that the values of the new bitstream that are the same as the current bitstream do not need to be written.
  • the FPGA 120 may pause operation of the logic components of the FPGA 120 while writing the values in block 612. In such embodiments, the FPGA 120 will continue execution of the logic gates after writing the values in block 612 is complete.
  • the method 600 then loops back to block 602, in which the FPGA 120 evaluates whether the codestream write process should be performed.
  • the method 600 advances to block 614, in which the FPGA 120 decodes the codestream using a decompression algorithm.
  • the FPGA 120 writes the values in of the appropriate logic blocks and interconnects based on the decoded bitstream.
  • the FPGA 120 may pause operation of the logic components of the FPGA 120 while writing the values in block 616. In such embodiments, the FPGA 120 will continue execution of the logic gates after writing the values in block 616 is complete.
  • the method 600 then loops back to block 602, in which the FPGA 120 evaluates whether the codestream write process should be performed.
  • An embodiment of the technologies disclosed herein may include any one or more, and any combination of, the examples described below.
  • Example 1 includes a compute device to reconfigure a field-programmable gate array (FPGA) , the compute device comprising one or more processors; one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, causes the compute device to generate a residual bitstream based on a new bitstream for the FPGA and a current bitstream of a current configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream; generate a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and the current bitstream and indicates values of the new bitstream at each of the plurality of bitstream locations; and load the residual codestream into the FPGA.
  • FPGA field-programmable gate array
  • Example 2 includes the subject matter of Example 1, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to generate a compression codestream corresponding to the new bitstream; compare a length of the compression codestream to a length of the residual codestream; and determine that the length of the residual codestream is less than the length of the compression codestream.
  • Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to generate the residual bitstream comprises to perform a bit-wise exclusive or operation on the new bitstream and the current bitstream.
  • Example 4 includes the subject matter of any of Examples 1-3, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to encode a one-bit flag in the residual codestream, wherein the one-bit flag represents a residual encoding type for the residual codestream.
  • Example 5 includes the subject matter of any of Examples 1-4, and wherein the plurality of instructions further cause the compute device to acquire a new program, wherein the new program comprises a plurality of instructions; and compile, based on the current bitstream, the new program to generate the new bitstream.
  • Example 6 includes the subject matter of any of Examples 1-5, and wherein the bitstream is a partial bitstream.
  • Example 7 includes the subject matter of any of Examples 1-6, and wherein the bitstream is a complete bitstream.
  • Example 8 includes the subject matter of any of Examples 1-7, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to acquire a plurality of bitstreams; acquire a new program, wherein the new program comprises a plurality of instructions; compile, for each of the bitstreams of the plurality of bitstreams and based on the corresponding bitstream of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program; and storing the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
  • Example 9 includes the subject matter of any of Examples 1-8, and wherein the plurality of bitstreams includes the current bitstream, wherein the plurality of instructions further cause the compute device to retrieve the bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
  • Example 10 includes the subject matter of any of Examples 1-9, and further including the FPGA, wherein the FPGA is on a multi-chip package with the one or more processors.
  • Example 11 includes the subject matter of any of Examples 1-10, and further including the FPGA, wherein the FPGA is on a system-on-a-chip with the one or more processors.
  • Example 13 includes the subject matter of Example 12, and further including a plurality of logic gates, wherein the bitstream loader is further to pause operation of the plurality of logic gates prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and resume operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein to resume operation of the plurality of logic gates comprises to resume operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
  • Example 14 includes the subject matter of any of Examples 12 and 13, and further including a plurality of logic gates, wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
  • Example 17 includes the subject matter of any of Examples 15 and 16, and wherein generating the residual bitstream comprises performing a bit-wise exclusive or operation on the new bitstream and the current bitstream.
  • Example 18 includes the subject matter of any of Examples 15-17, and further including encoding a one-bit flag in the residual codestream, wherein the one-bit flag represents a residual encoding type for the residual codestream.
  • Example 28 includes the subject matter of any of Examples 26 and 27, and wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
  • Example 29 includes one or more computer-readable media comprising a plurality of instructions stored thereon that, when executed, causes a compute device to perform the method of any of Examples 15-28.
  • Example 35 includes the subject matter of any of Examples 30-34, and wherein the bitstream is a partial bitstream.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

Technologies for rapid reconfiguration of a field programmable gate array (FPGA) are disclosed. A compute device, which may include the (FPGA), may select a new bitstream to load into the (FPGA). The compute device may compare the new bitstream against a current bitstream corresponding to the current configuration of the (FPGA) and determine the differences between the new bitstream and the current bitstream. The compute device may then encode the differences between the new bitstream and the current bitstream into a codestream and load the codestream into the (FPGA). The (FPGA) may then decode the codestream and update the memory locations of the FPGA such as settings of logic blocks and interconnects that are not the same between the new bitstream and the current bitstream.

Description

TECHNOLOGIES FOR RAPID CONFIGURATION OF FIELD-PROGRAMMABLE GATE ARRAYS BACKGROUND
Field-programmable gate arrays (FPGAs) are typically semiconductor devices or integrated circuits that are capable of being configured (or reconfigured) after manufacture. FPGAs may contain numerous logic blocks that are each programmable using a specific configuration. This configuration may be in the form of instructions that are authored in a specific language. The instructions may be loaded into the FPGA in various ways.
Circumstances may necessitate reconfiguration of the FPGA (e.g., due to updated instructions being available, performance problems, errors, or other issues) . Some known systems require a complete overwrite of an existing configuration with an updated configuration, before one or more logic blocks of the FPGA can resume operation. This may be the case even though the updated reconfiguration is similar to the current configuration of the FPGA. Therefore, in some high performance applications that require fast or frequent updating of logic blocks, the reconfiguration time cost may be significant and a fast reconfiguration design may be desirable.
BRIEF DESCRIPTION OF THE DRAWINGS
The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.
FIG. 1 is a simplified block diagram of at least one embodiment of a compute device for rapid reconfiguration of an FPGA device;
FIG. 2 is a simplified block diagram of at least one embodiment of an environment that may be established by the compute device of FIG. 1;
FIGS. 3 and 4 are simplified flow diagrams of at least one embodiment of a method for rapid reconfiguration of an FPGA device that may be performed by the compute device of FIGS. 1 and 2;
FIG. 5 is a simplified flow diagrams of at least one embodiment of a method for generating a residual codestream that may be performed by the compute device of FIGS. 1 and 2; and
FIG. 6 is a simplified flow diagram of at least one embodiment of a method for writing a codestream to an FPGA that may be performed by the FPGA of FIG. 1.
DETAILED DESCRIPTION OF THE DRAWINGS
While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.
References in the specification to “one embodiment, ” “an embodiment, ” “an illustrative embodiment, ” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be appreciated that items included in a list in the form of “at least one A, B, and C” can mean (A) ; (B) ; (C) ; (Aand B);(Aand C) ; (B and C) ; or (A, B, and C) . Similarly, items listed in the form of “at least one of A, B, or C” can mean (A) ; (B) ; (C) ; (Aand B) ; (Aand C) ; (B and C) ; or (A, B, and C) .
The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device) .
In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.
Referring now to FIG. 1, in an illustrative embodiment, a compute device 106 includes a field programmable gate array (FPGA) 120. The compute device 106 may upload a bitstream to the FPGA 120 in order to run a program on the FPGA 120. The compute device 106 may then upload a new bitstream, which may be similar to the present bitstream of the FPGA 120. In order to more efficiently transfer the new bitstream and more efficiently update the FPGA 120, the compute device 106 may determine the differences between the new bitstream and the present bitstream. The compute device 106 may encode the differences into a codestream, which may be of a much smaller size than the bitstream. The compute device 106 provides the codestream to the FPGA 120, which may then update the configuration of the FPGA 120 based on the differences between the new bitstream and the present bitstream, which may require less writing and, therefore, less time as compared to rewriting the entire configuration of the FPGA 120.
The compute device 106 may be embodied as any type of computation or computer device capable of performing the functions described herein, including, without limitation, a computer, a server, a rack-mounted server, a workstation, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a mobile computing device, a wearable computing device, a network appliance, a web appliance, a distributed computing system, a processor-based system, and/or a consumer electronic device. As shown in FIG. 1, the compute device 106 illustratively includes a central processing unit (CPU) 110, a main memory 112, an input/output (I/O) subsystem 114, communication circuitry 116, the FPGA 120, and data storage 122. Of course, in other embodiments, the compute device 106 may include other or additional components, such as those commonly found in a computer (e.g., a display, various peripheral devices, etc. ) . Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component. For  example, in some embodiments, the main memory 112, or portions thereof, may be incorporated in the CPU 110.
The CPU 110 may be embodied as any type of processor capable of performing the functions described herein. The CPU 110 may be embodied as a single or multi-core processor (s) , a microcontroller, or other processor or processing/controlling circuit. In some embodiments, the CPU 110 may be embodied as, include, or be coupled to a field programmable gate array (FPGA) , an application specific integrated circuit (ASIC) , a graphics processing unit (GPU) , reconfigurable hardware or hardware circuitry, or other specialized hardware to facilitate performance of the functions described herein. Similarly, the main memory 112 may be embodied as any type of volatile (e.g., dynamic random access memory (DRAM) , etc. ) or non-volatile memory or data storage capable of performing the functions described herein. In some embodiments, all or a portion of the main memory 112 may be integrated into the CPU 110. In operation, the main memory 112 may store various data and software used during operation of the compute device 106 such as packet data, operating systems, applications, programs, libraries, and drivers. The main memory 112 is communicatively coupled to the CPU 110 via the I/O subsystem 114, which may be embodied as circuitry and/or components to facilitate input/output operations with the CPU 110, the main memory 112, and other components of the host compute device 102. For example, the I/O subsystem 114 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, sensor hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc. ) and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 114 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the CPU 110, the main memory 112, and other components of the compute device 106, on a single integrated circuit chip.
In addition, the compute device 106, in the illustrative embodiment, includes the FPGA 120. The FPGA 120 may be embodied as any hardware device (e.g., a co-processor, an FPGA, and ASIC, or circuitry) capable of performing functions that include rapid reconfiguration of FPGAs. More specifically, the FPGA 120 is any device capable of performing the rapid FPGA reconfiguration scheme described with respect to FIG. 6 below. In the illustrative embodiment, the FPGA 120 contains an array of programmable logic blocks with a hierarchy of reconfigurable interconnects that allow that logic blocks to be wired together. The  illustrative programmable logic blocks may be configured to perform complex combinational functions or merely simple logic gates like AND, OR, XOR, NAND, etc. The specific configuration of an FPGA 120 includes the settings for the logic gates and interconnects. A configuration of the FPGA 120 may be expressed as a bitstream with each bit of the bitstream corresponding to a particular setting, register, or bit value of the FPGA 120.
The communication circuitry 116 may be embodied as any communication circuit, device, or collection thereof, capable of enabling communications over a network between the compute device 106 other components such as other compute devices 106. The communication circuitry 116 may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., Ethernet, 
Figure PCTCN2017090995-appb-000001
WiMAX, etc. ) to effect such communication.
The illustrative communication circuitry 116 includes a network interface controller (NIC) 118, which may also be referred to as a host fabric interface (HFI) . The NIC 118 may be embodied as one or more add-in-boards, daughtercards, network interface cards, controller chips, chipsets, or other devices that may be used by the compute device 106 toother components such as other compute devices 106. In some embodiments, the NIC 118 may be embodied as part of a system-on-a-chip (SoC) that includes one or more processors, or included on a multichip package that also contains one or more processors. In some embodiments, the NIC 118 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 118. In such embodiments, the local processor of the NIC 118 may be capable of performing one or more of the functions of the CPU 110 described herein. Additionally or alternatively, in such embodiments, the local memory of the NIC 118 may be integrated into one or more components of the compute device 106 at the board level, socket level, chip level, and/or other levels.
The compute device 106 may additionally include a data storage device 122, which may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices. The data storage device 122 may include a system partition that stores data and firmware code for the compute device 106. The data storage device 122 may also include an operating system partition that stores data files and executables for an operating system of the compute device 106.
Additionally, the compute device 106 may include a display 124. The display 124 may be embodied as, or otherwise use, any suitable display technology including, for example, a liquid crystal display (LCD) , a light emitting diode (LED) display, a cathode ray tube (CRT) display, a plasma display, and/or other display usable in a compute device. The display may include a touchscreen sensor that uses any suitable touchscreen input technology to detect the user’s tactile selection of information displayed on the display including, but not limited to, resistive touchscreen sensors, capacitive touchscreen sensors, surface acoustic wave (SAW) touchscreen sensors, infrared touchscreen sensors, optical imaging touchscreen sensors, acoustic touchscreen sensors, and/or other type of touchscreen sensors. Additionally or alternatively, the compute device 106 may include one or more peripheral devices 126. Such peripheral devices 126 may include any type of peripheral device commonly found in a compute device such as speakers, a mouse, a keyboard, and/or other input/output devices, interface devices, and/or other peripheral devices. It should be appreciated that the compute device 106 may include other components, sub-components and/or devices commonly found in a compute device, which are not illustrated in FIG. 1 for clarity of the description.
Referring now to FIG. 2, in the illustrative embodiment, the compute device 106 may establish an environment 200 during operation. The illustrative environment 200 includes an FPGA manager 220, a bitstream manager 230, a residual manager 240, and a codestream generation manager 250. In addition, the illustrative environment 200 also includes the FPGA 120, described above with respect to FIG. 1. The FPGA 120 includes a bitstream receiver 260, a bitstream decoder 270, and a bitstream loader 280. Each of the components of the environment 200 may be embodied as hardware, firmware, software, or a combination thereof. As such, in some embodiments, one or more of the components of the environment 200 may be embodied as circuitry or collection of electrical devices (e.g., FPGA manager circuitry 220, bitstream manager circuitry 230, residual manager circuitry 240, codestream generation manager circuitry 250, etc. ) . It should be appreciated that, in such embodiments, one or more of the FPGA manager circuitry 220, bitstream manager circuitry 230, residual manager circuitry 240, and codestream generation manager circuitry 250 may form a portion of one or more of the CPU 110, main memory 112, I/O subsystem 114, communication circuitry 116 and/or other components of the compute device 106. Additionally, in some embodiments, one or more of the illustrative components may form a portion of another component and/or one or more of the illustrative components may be  independent of one another. Further, in some embodiments, one or more of the components of the environment 200 may be embodied as virtualized hardware components or emulated architecture, which may be established and maintained by the CPU 110 or other components of the compute device 106.
In the illustrative environment 200, the compute device 106 also includes stream data 202, residual codestream data 204, and compression codestream data 206. The stream data 202 may be embodied as any data that is indicative of one or more bitstreams intended for or transmitted to an FPGA during reconfiguration. The stream data 202, in the illustrative embodiment, also includes data indicative of codestreams that are generated using any bitstreams intended for an FPGA. The residual codestream data 204 may be embodied as any data that is indicative of differences between two bitstreams, such as the differences between a new bitstream and a bitstream of a present configuration of the FPGA 120. The residual codestream data 204 may also be embodied as any data that is generated by one or more residual encoding algorithms that may be employed to generate a residual codestream, a residual bitstream, or the like. The compression codestream data 206 may be embodied as any data usable to encode FPGA configurations into a codestream using techniques that do not require generation of a residual bitstream. For example, the compression codestream data 206 may be embodied as any data that is generated by one or more compression algorithms. The stream data 202, residual codestream data 204, and compression codestream data 206 may be accessed by the various components and/or sub-components of the compute device 106.
The FPGA manager 220, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to facilitate management of connected FPGA devices. More specifically, the FPGA manager 220 may be configured to load updated bitstreams into connected FPGA devices using communication techniques such as streaming. The FPGA manager 220 may schedule when certain configurations of the FPGA 120 are loaded onto the FPGA 120, how long a particular configuration may be executed by the FPGA 120, an order of execution of different configurations of the FPGA 120, provide input to the FPGA 120, receive output from the FPGA 120, etc. In some embodiments, the FPGA manager 220 may perform a partial reconfiguration of the FPGA 120 by providing a bitstream corresponding to a certain subset of the settings, registers, or bit values of the FPGA 120.
The bitstream manager 230, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to manage bitstreams that may be loaded onto the FPGA 120. As used herein, ‘bitstream’ refers to a sequence of bits (or binary values) that can be serially transmitted from one compute device to another. A bitstream may refer to the sequence of bits before being transmitted to the FPGA 120 or after the values of the bitstream are loaded into the FPGA 120, and the bitstream is not limited to referring to the sequence of bits when the sequence is actually being transmitted. A bitstream may either be a complete bitstream, which specifies the setting for every logic block and reconfigurable interconnect of the FPGA 120, or a partial bitstream, which specifies the setting for every logic block and reconfigurable interconnect of only a portion of the FPGA 120. It should be appreciated that the techniques and algorithms disclosed herein may be applied to a complete reconfiguration of the FPGA 120 or to a partial reconfiguration of the FPGA 120. The bitstream manager 230 may generate bitstreams, such as by compiling a program made up of instructions that are written using a particular language (e.g., a programming language or a hardware-definition language) . The bitstream manager 230 is configured to convert these instructions into a stream or series of bits (or bytes, or hexadecimal values, or the like) . The bitstream manager 230 may also generate a bitstream based on the current configuration for the FPGA 120. In some embodiments, when compiling a program to generate a bitstream, the bitstream manager 230 may be configured to generate a bitstream from the program in such a way that the generated bitstream has fewer differences with a known bitstream, such as a bitstream currently loaded on the FPGA 120, than might otherwise be the case. For example, the bitstream manager 230 may compile a segment of the program and determine a configuration of a portion of the FPGA 120 such as the settings associated with a set of logic blocks near each other. The bitstream manager 230 may compare the configuration of that set of logic blocks with the configuration of the FPGA 120. If the settings of a portion of the current configuration of the FPGA 120 is similar or identical to the settings of the set of logic blocks, then the bitstream manager 230 may place that set of logic blocks in the same physical location as the current configuration of the FPGA 120. It should be appreciated that such an approach may reduce or minimize the differences in the bitstream associated with the current configuration of the FPGA 120 and the bitstream that the bitstream manager 230. In some embodiments, the bitstream manager 230 may generate several versions of a bitstream for the  same program, such as by generating a version based on a comparison for each of several other bitstreams. When loading a version of the bitstream of the program onto the FPGA 120, the version of the bitstream of the program may be selected based on the which of the several other bitstreams is currently loaded on the FPGA 120 or based on which of the several versions has the fewest differences with the current bitstream loaded on the FPGA 120. Additionally or alternatively, the bitstream manager 230 may generate the bitstream of a program shortly before loading the bitstream onto the FPGA 120 by generating the bitstream of the program to have relatively few differences with the bitstream currently loaded on the FPGA 120.
The residual manager 240, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to generate residual bitstreams by comparing two bitstreams for the FPGA 120. For example, the residual manager 240 may be configured to receive an updated bitstream intended for an FPGA device, identify the FPGA device from a group of connected devices, retrieve the current bitstream for the current configuration for the FPGA device. The residual manager may be further configured to generate a residual bitstream using the current bitstream and the updated bitstream. The residual bitstream indicates the differences between the two bitstreams being compared. For example, the current bitstream and the updated bitstream may be parsed into a series of values, with each value being assigned an index, counter, or other referential indicator to indicate the position of each value in the series. After assigning some referential indicator to each value in both bitstream as described above, the residual manager 240 compares the value at the corresponding index for each bitstream. For example, the residual manager 240 compares the value at index 0 for both the current bitstream and the updated bitstream. If the values are identical, a zero residual value may be generated. In other words, the zero residual value gives an indication that values in both bitstreams are the same at the corresponding index. If the values are not identical, a non-zero residual value may be generated. In the illustrative embodiment, the non-zero residual value may indicate the differences between the value of the current bitstream and the value of the updated bitstream (such as an exclusive or (XOR) operation between the values) or may indicate the value of the updated bitstream. It should be appreciated that, in some cases, the residual bitstream may provide an indication that a zero residual value at a certain index may indicate that there is a difference between the current bitstream and the updated bitstream and that zero is the updated value. The values at each index  of the bitstreams may be one or more bits and may be, for example, represented in hexadecimal. Where the values in each configuration are in hexadecimal form, the residual may also be generated as a hexadecimal value. Accordingly, a zero residual value may be represented as 0x00, and a non-zero residual value may be represented as, for example, 0x7f, 0x12, or the like.
The codestream generation manager 250, which may be embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof as discussed above, is configured to generate a codestream from bitstreams that are being generated and manipulated by the bitstream manager 230 and the residual manager 240. For example, the residual manager 240 will generate a residual bitstream that will be used to generate a codestream that can then be loaded into the FPGA 120 as the updated configuration. The codestream generation manager 250 may be configured to generate codestreams using more than one algorithm. For example, the codestream generation manager 250 may perform a compression algorithm such as run-length encoding on the bitstream to generate a compression codestream. The compression codestream can be decompressed to generate the bitstream without any additional information such as information indicating differences with another bitstream. In the illustrative embodiment, a compression codestream generated by the codestream generation manager 250 with use of a run-length encoding may be embodied as a series of pairs of values, with the first value of each pair indicating a bitstream value of the bitstream and the second value of each pair indicating the number of consecutive values of the bitstream that are the first value. The codestream generation manager 250 may also encode the bitstream into a residual codestream using the residual bitstream, such as by indicating which locations of the updated bitstream are different from the current bitstream in a compressed form. In the illustrative embodiment, as described in more detail below in regard to FIG. 5, a residual codestream may be embodied as a series of pairs of values or triples of values. Each pair of values includes a first value (such as a 0) indicating that there is not a difference between the current bitstream and the updated bitstream and a second value indicating the number of consecutive locations that are not different. Each triple of values includes a first value (such as 1) indicating that there is a difference between the current bitstream and the updated bitstream, a second value indicating a value of the updated bitstream, and a third value indicating the number of consecutive values of the updated bitstream that have the second value. Of course, it should be appreciated that each of the compression codestream and the residual codestream may include  an indication of which type of encoding is used, such as a single bit at the beginning of the codestream. In some embodiments, the codestream generation manager 250 may generate both a compression codestream and a residual codestream of the updated bitstream, and provide the smaller of the compression codestream and the residual codestream to the FPGA manager 220. Additionally or alternatively, the codesteam generation manager 250 may generate one codestream based on, e.g., an indication from the user or from another component of the compute device 106 that the updated bitstream is similar to the current bitstream (in which case the residual codestream may be used) or that the updated bitstream is not similar to the current bitstream (in which case the compression codestream may be used) . In some embodiments, whether the residual codestream is used may be based on whether there is a residual bitstream stored in the stream data 202 that was generated based on the current bitstream of the FPGA 120. If there is such a residual bitstream, then the residual codestream may be used.
The FPGA 120 is configured to acquire a bitstream and load it as a new or updated configuration. The FPGA 120 may accept input from another component of the compute device 106 such as the FPGA manager 220, operate the logic gates of the current configuration of the FPGA 120, and provide output to components of the compute device 106 such as the FPGA manager 220.
The bitstream receiver 260 is configured to receive a bitstream from another component of the compute device 106 such as the FPGA manager 220. In the illustrative embodiment, the bitstream receiver 260 of the FPGA 120 is configured to receive an encoded bitstream, such as a residual codestream of an updated bitstream or a compressed codestream of the updated bitstream. The bitstream decoder 270 of the FPGA 120 is configured to decode the encoded bitstream. The bitstream decoder 270 may decode the encoded bitstream by first determining what encoding is used and then decoding the bitstream. The bitstream loader 280 is configured to update the configuration of the FPGA 120 by writing values of the bitstream to the corresponding settings, registers, or bit values of the FPGA 120 based on the decoded bitstream. If the encoded bitstream is a residual codestream, the bitstream loader 280 may only write values of the bitstream that are indicated to be different from the current bitstream.
It should be appreciated that each of the FPGA manager 220, bitstream manager 230, residual manager 240, and codestream generation manager 250 may be separately embodied as hardware, firmware, software, virtualized hardware, emulated architecture, and/or a  combination thereof. For example, the FPGA manager 220 may be embodied as a hardware component, while the bitstream manager 230 and the codestream generation manager 250 may be embodied as virtualized hardware components or as some other combination of hardware, firmware, software, virtualized hardware, emulated architecture, and/or a combination thereof.
Referring now to FIG. 3, in use, the compute device 106 may execute a method 300 for rapid reconfiguration of an FPGA device. It should be appreciated that, in some embodiments, the operations of the method 300 may be performed by one or more components of the environment 200 of the compute device 106 as shown in FIG. 2. The method 300 begins with block 302, in which the compute device 106 determines whether to load a new program (e.g., an updated bitstream) into the FPGA 120. In the illustrative embodiment, the compute device 106 may determine to perform the rapid FPGA reconfiguration in response to determining that the FPGA manager 220 has received a new or updated bitstream or program for the FPGA 120, in response to a determination that a setting stored in a configuration file in the data storage device 122 indicates to perform the rapid FPGA reconfiguration, and/or as a function of other criteria. If the compute device 106 is not to load a new program, the method 300 loops back to block 302. If the compute device 106 is to load a new program, the method 300 advances to block 304, in which the compute device 106 selects a new program to load into the FPGA 120. In the illustrative embodiment, the program selection may be based on a variety of criteria. For example, the compute device 106 may have received an explicit instruction (e.g., from an operator or other compute device) that a new program should be loaded. Additionally or alternatively, the compute device 106 may have a schedule for when certain programs should be run on the FPGA 120 or other criteria indicating when certain programs should be run.
In block 306, the compute device 106 acquires the bitstream for the new program. The bitstream of the new program may generated by the compute device 106 or may be stored in the compute device 106. As discussed above in more detail, in some embodiments, the compute device 106 may generate the bitstream of the new program so as to minimize or reduce the differences with a current bitstream loaded on the FPGA 120. The compute device 106 may have already generated a bitstream with relatively few differences to the current bitstream and such a bitstream may be stored on the compute device 106. In some embodiments, the compute device 106 may have generated different versions of the same program, such as versions which have a low number of differences as compared to various known bitstreams. In such  embodiments, the compute device 106 may choose a version of the program which is known to have a low number of differences as compared to the current bitstream of the FPGA 120, or the compute device 106 may determine which of the several versions of the program have the fewest differences as compared to the current bitstream and choose that version.
In block 308, the compute device 106 determines an encoding algorithm to use to encode the new bitstream before transmitting the new bitstream to the FPGA 106. In the illustrative embodiment, the compute device 106 may compare the length of a residual codestream and the length of a compression codestream and select the codestream that is shorter. To do so, the compute device 106 may generate a residual codestream of the new bitstream and the current bitstream in block 310. The compute device 106 may first calculate the residual of the new bitstream and the current bitstream in block 312. As discussed above in more detail, the residual bitstream indicates the differences between the two bitstreams being compared. In the illustrative embodiment, as described in more detail below in regard to FIG. 5, the residual bitstream may be embodied as the bit-wise XOR operation of the two bitstreams being compared. The residual codestream is a compressed representation of the residual bitstream. In the illustrative embodiment, a residual codestream may be embodied as a series of pairs of values or triples of values. Each pair of values includes a first value (such as a 0) indicating that there is not a difference between the current bitstream and the updated bitstream and a second value indicating the number of consecutive locations that are not different. Each triple of values includes a first value (such as 1) indicating that there is a difference between the current bitstream and the updated bitstream, a second value indicating a value of the updated bitstream, and a third value indicating the number of consecutive values of the updated bitstream that have the second value. In block 314, the compute device 106 may generate a compression codestream of the new bitstream using. The compression codestream can be decompressed to generate the bitstream without any additional information such as information indicating differences with another bitstream. In the illustrative embodiment, a compression codestream generated by the codestream generation manager 250 with use of a run-length encoding may be embodied as a series of pairs of values, with the first value of each pair indicating a bitstream value of the bitstream and the second value of each pair indicating the number of consecutive values of the bitstream that are the first value.
Additionally or alternatively, the compute device 106 may determine an encoding algorithm based on a factor other than the lengths of the residual codestream and the compression codestream. For example, the compute device 106 may receive an indication from a user of the compute device 106 indicating which encoding algorithm to use, or the compute device 106 may determine to use residual encoding based on the presence of a stored bitstream corresponding to the new program that is known to have a small number of differences with the current bitstream.
The method 300 proceeds to block 316 in FIG. 4, in which, if the compute device 106 determined in block 308 not to use residual encoding, the method 300 proceeds to block 324, in which the compute device 106 generates a compression codestream of the new bitstream. If the compute device 106 determined in block 308 to use residual encoding, the method 300 proceeds to block 318, in which the compute device 106 generates a residual codestream of the new bitstream and the current bitstream, as described above in regard to block 310. Of course, in embodiments in which the compute device 106 generated the residual codestream as part of determining whether or not to use residual encoding, the compute device 106 need not regenerate the residual codestream in block 318.
In block 320, the compute device 106 provides an indication as part of the residual codestream that residual encoding was used. In the illustrative embodiment, the compute device 106 may add and clear a one-bit flag at the beginning of the codestream. After block 320, the method 300 proceeds to block 330, in which the compute device 106 loads the new codestream into the FPGA 120.
Referring back to block 316, if the compute device 106 determined in block 308 not to use residual encoding, the method 300 proceeds to block 324, in which the compute device 106 generates a compression codestream of the new bitstream, as described above in regard to block 314. Of course, in embodiments in which the compute device 106 generated the compression codestream as part of determining whether or not to use residual encoding, the compute device 106 need not regenerate the compression codestream in block 324.
In block 324, the compute device 106 provides an indication as part of the compression codestream that compression encoding was used. In the illustrative embodiment, the compute device 106 may add and set a one-bit flag at the beginning of the codestream.
In block 330, the compute device 106 loads the codestream into the FPGA 120. In some embodiments, the compute device 106 includes the one-bit flag in the codestream to denote the encoding type (e.g., residual encoding or compressive encoding) . The method 300 may then loop back to block 302 in FIG. 3.
Referring now to FIG. 5, in use, the compute device 106 proceeds to execute a method 500 for generating a residual codestream from a residual bitstream. The residual bitstream may be a series of values that are each denoted by an index. An integer index i from 0 to some non-zero value n may be used. The method begins at block 502, in which the compute device 106 begins at the start (e.g., index = 0) of the residual bitstream. In block 504, the compute device 106 retrieves the residual value at the particular location in the residual bitstream indicated by the index. The method advances to block 506, in which the compute device 106 determines whether the residual value at the index i equals a zero residual value.
If the value in the residual bitstream at the index i equals a zero residual value, the method advances to block 510, in which the compute device 106 determines a number of consecutive residual values that are of value 0x00. The method advances to block 512, in which the compute device 106 encodes a ‘skip’ command in the residual codestream. The skip command indicates that the values at that index in the current bitstream and the updated bitstream are identical and so there is no need to encode the updated bitstream value. Therefore, the value at that index in the updated configuration bitstream may simply be skipped such that the current value at that index (i.e., the value from the current configuration) will be retained on the FPGA. In addition to encoding the skip command, the number of consecutive zero values to be skipped is also encoded at block 514.
The method advances to block 516, in which the compute device 106 increments the index i for the residual bitstream by the number of values skipped. The method advances to block 528, in which the compute device 106 determines whether the residual bitstream has more values.
Referring back to block 506, if the residual value at index = i is not 0x00, the method advances to block 518. The residual value being non-zero indicates that, at this index location, the value in the current configuration bitstream differs from that in the updated configuration bitstream. In block 518, the compute device 106 determines the number of consecutive non-zero residual values corresponding to the same new bitstream value. It should  be appreciated that different consecutive non-zero residual values may correspond to the same consecutive value in the new bitstream, such as when the consecutive values of the new bitstream are the same but the consecutive values of the current bitstream are not the same.
In block 520, the compute device 106 encodes an overwrite command in the codestream. In encoding the overwrite command, the method advances to block 522 in which the compute device 106 specifies the value from the new bitstream at index i into the residual codestream. Additionally, the compute device 106 specifies, at block 524, the number of consecutive values to be overwritten. The method then advances to block 526, in which the compute device 106 increments the index i by the number of values overwritten as specified in block 524. The method 500 then advances to block 528, in which the compute device 106 determines whether there are more values in the residual bitstream. If there are more values in the residual bitstream, the method 500 loops back to block 504. If there are not more values in the residual bitstream, then encoding the residual codestream is complete.
Referring now to FIG. 6, in use, the FPGA 120 may execute a method 600 for performing a codestream write process. In the illustrative embodiment, the FPGA 120 is configured to determine whether to perform the codestream write process at block 602. If there is a determination to perform the codestream write process, the method advances to block 604, in which the FPGA 120 receives a codestream to be written to the FPGA 120.
At block 606, the FPGA 120 determines the encoding that was used to generate the codestream. For example, the FPGA 120 may parse the first bit of the codestream. As described above with respect to FIG. 4, the first bit of the codestream may represent the type of encoding that was used to generate the codestream (e.g., compressive encoding, residual encoding, or the like) . In block 608, if a residual encoding is used, the method 600 advances to block 610, in which the FPGA 120 decodes the codestream using a residual decoding algorithm.
In block 612, the FPGA 120 writes the values of the appropriate logic blocks and interconnects which are different from the new bitstream based on the decoded residual bitstream. It should be appreciated that the values of the new bitstream that are the same as the current bitstream do not need to be written. In some embodiments, the FPGA 120 may pause operation of the logic components of the FPGA 120 while writing the values in block 612. In such embodiments, the FPGA 120 will continue execution of the logic gates after writing the values in  block 612 is complete. The method 600 then loops back to block 602, in which the FPGA 120 evaluates whether the codestream write process should be performed.
Referring back to block 608, if a residual encoding is not used, the method 600 advances to block 614, in which the FPGA 120 decodes the codestream using a decompression algorithm. In block 616, the FPGA 120 writes the values in of the appropriate logic blocks and interconnects based on the decoded bitstream. In some embodiments, the FPGA 120 may pause operation of the logic components of the FPGA 120 while writing the values in block 616. In such embodiments, the FPGA 120 will continue execution of the logic gates after writing the values in block 616 is complete. The method 600 then loops back to block 602, in which the FPGA 120 evaluates whether the codestream write process should be performed.
EXAMPLES
Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.
Example 1 includes a compute device to reconfigure a field-programmable gate array (FPGA) , the compute device comprising one or more processors; one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, causes the compute device to generate a residual bitstream based on a new bitstream for the FPGA and a current bitstream of a current configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream; generate a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and the current bitstream and indicates values of the new bitstream at each of the plurality of bitstream locations; and load the residual codestream into the FPGA.
Example 2 includes the subject matter of Example 1, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to generate a compression codestream corresponding to the new bitstream; compare a length of the compression codestream to a length of the residual codestream; and determine that the length of the residual codestream is less than the length of the compression codestream.
Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to generate the residual bitstream comprises to perform a bit-wise exclusive or operation on the new bitstream and the current bitstream.
Example 4 includes the subject matter of any of Examples 1-3, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to encode a one-bit flag in the residual codestream, wherein the one-bit flag represents a residual encoding type for the residual codestream.
Example 5 includes the subject matter of any of Examples 1-4, and wherein the plurality of instructions further cause the compute device to acquire a new program, wherein the new program comprises a plurality of instructions; and compile, based on the current bitstream, the new program to generate the new bitstream.
Example 6 includes the subject matter of any of Examples 1-5, and wherein the bitstream is a partial bitstream.
Example 7 includes the subject matter of any of Examples 1-6, and wherein the bitstream is a complete bitstream.
Example 8 includes the subject matter of any of Examples 1-7, and wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to acquire a plurality of bitstreams; acquire a new program, wherein the new program comprises a plurality of instructions; compile, for each of the bitstreams of the plurality of bitstreams and based on the corresponding bitstream of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program; and storing the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
Example 9 includes the subject matter of any of Examples 1-8, and wherein the plurality of bitstreams includes the current bitstream, wherein the plurality of instructions further cause the compute device to retrieve the bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
Example 10 includes the subject matter of any of Examples 1-9, and further including the FPGA, wherein the FPGA is on a multi-chip package with the one or more processors.
Example 11 includes the subject matter of any of Examples 1-10, and further including the FPGA, wherein the FPGA is on a system-on-a-chip with the one or more processors.
Example 12 includes a field programmable gate array (FPGA) comprising a bitstream receiver to receive a residual codestream of a new bitstream for loading into the FPGA, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and a current bitstream of a current configuration of the FPGA and indicates values of the new bitstream at each of the plurality of locations; a bitstream decoder to decode the residual codestream; and a bitstream loader to write, for each bitstream location of the plurality of bitstream locations, a value of the new bitstream at the corresponding bitstream location to a corresponding FPGA memory location.
Example 13 includes the subject matter of Example 12, and further including a plurality of logic gates, wherein the bitstream loader is further to pause operation of the plurality of logic gates prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and resume operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein to resume operation of the plurality of logic gates comprises to resume operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
Example 14 includes the subject matter of any of Examples 12 and 13, and further including a plurality of logic gates, wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
Example 15 includes a method for reconfiguring a field-programmable gate array (FPGA) with a compute device, the method comprising generating, by the compute device, a residual bitstream based on a new bitstream and a current bitstream of a current configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream; generating, by the compute device, a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and the current bitstream and indicates  values of the new bitstream at each of the plurality of bitstream locations; and loading, by the compute device, the residual codestream into the FPGA.
Example 16 includes the subject matter of Example 15, and further including generating, by the compute device, a compression codestream corresponding to the new bitstream; comparing, by the compute device, a length of the compression codestream to a length of the residual codestream; and determining, by the compute device, that the length of the residual codestream is less than the length of the compression codestream.
Example 17 includes the subject matter of any of Examples 15 and 16, and wherein generating the residual bitstream comprises performing a bit-wise exclusive or operation on the new bitstream and the current bitstream.
Example 18 includes the subject matter of any of Examples 15-17, and further including encoding a one-bit flag in the residual codestream, wherein the one-bit flag represents a residual encoding type for the residual codestream.
Example 19 includes the subject matter of any of Examples 15-18, and further including acquiring a new program, wherein the new program comprises a plurality of instructions; and compiling, based on the current bitstream, the new program to generate the new bitstream.
Example 20 includes the subject matter of any of Examples 15-19, and wherein the bitstream is a partial bitstream.
Example 21 includes the subject matter of any of Examples 15-20, and wherein the bitstream is a complete bitstream.
Example 22 includes the subject matter of any of Examples 15-21, and further including acquiring, by the compute device, a plurality of bitstreams; acquiring, by the compute device, a new program, wherein the new program comprises a plurality of instructions; compiling, by the compute device and for each of the bitstreams of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program based on the corresponding bitstream of the plurality of bitstreams; and storing, by the compute device, the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
Example 23 includes the subject matter of any of Examples 15-22, and wherein the plurality of bitstreams includes the current bitstream, further comprising retrieving the  bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
Example 24 includes the subject matter of any of Examples 15-23, and wherein the FPGA is on a multi-chip package with the compute device.
Example 25 includes the subject matter of any of Examples 15-24, and wherein the FPGA is on a system-on-a-chip with the compute device.
Example 26 includes a method for reconfiguring a field programmable gate array (FPGA) , the method comprising receiving, by the FPGA, a residual codestream of a new bitstream for loading into the FPGA, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and a current bitstream of a current configuration of the FPGA and indicates values of the new bitstream at each of the plurality of locations; decoding, by the FPGA, the residual codestream; and writing, by the FPGA and for each bitstream location of the plurality of bitstream locations, a value of the new bitstream at the corresponding bitstream location to a corresponding FPGA memory location.
Example 27 includes the subject matter of Example 26, and further including pausing operation of a plurality of logic gates of the FPGA prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and resuming operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein resuming operation of the plurality of logic gates comprises resuming operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
Example 28 includes the subject matter of any of Examples 26 and 27, and wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
Example 29 includes one or more computer-readable media comprising a plurality of instructions stored thereon that, when executed, causes a compute device to perform the method of any of Examples 15-28.
Example 30 includes a compute device to reconfigure a field-programmable gate array (FPGA) , the compute device comprising means for generating a residual bitstream based  on a new bitstream and a current bitstream of a current configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream; means for generating a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and the current bitstream and indicates values of the new bitstream at each of the plurality of bitstream locations; and circuitry for loading the residual codestream into the FPGA.
Example 31 includes the subject matter of Example 30, and further including means for generating a compression codestream corresponding to the new bitstream; means for comparing a length of the compression codestream to a length of the residual codestream; and means for determining that the length of the residual codestream is less than the length of the compression codestream.
Example 32 includes the subject matter of any of Examples 30 and 31, and wherein the means for generating the residual bitstream comprises means for performing a bit-wise exclusive or operation on the new bitstream and the current bitstream.
Example 33 includes the subject matter of any of Examples 30-32, and further including means for encoding a one-bit flag in the residual codestream, wherein the one-bit flag represents a residual encoding type for the residual codestream.
Example 34 includes the subject matter of any of Examples 30-33, and further including means for acquiring a new program, wherein the new program comprises a plurality of instructions; and means for compiling, based on the current bitstream, the new program to generate the new bitstream.
Example 35 includes the subject matter of any of Examples 30-34, and wherein the bitstream is a partial bitstream.
Example 36 includes the subject matter of any of Examples 30-35, and wherein the bitstream is a complete bitstream.
Example 37 includes the subject matter of any of Examples 30-36, and further including means for acquiring a plurality of bitstreams; means for acquiring a new program, wherein the new program comprises a plurality of instructions; means for compiling, for each of the bitstreams of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program based on the  corresponding bitstream of the plurality of bitstreams; and means for storing the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
Example 38 includes the subject matter of any of Examples 30-37, and wherein the plurality of bitstreams includes the current bitstream, further comprising means for retrieving the bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
Example 39 includes the subject matter of any of Examples 30-38, and further including the FPGA, wherein the FPGA is on a multi-chip package with the one or more processors.
Example 40 includes the subject matter of any of Examples 30-39, and further including the FPGA, wherein the FPGA is on a system-on-a-chip with the one or more processors.
Example 41 includes a field programmable gate array (FPGA) , comprising circuitry for receiving, by the FPGA, a residual codestream of a new bitstream for loading into the FPGA, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and a current bitstream of a current configuration of the FPGA and indicates values of the new bitstream at each of the plurality of locations; means for decoding, by the FPGA, the residual codestream; and circuitry for writing, by the FPGA and for each bitstream location of the plurality of bitstream locations, a value of the new bitstream at the corresponding bitstream location to a corresponding FPGA memory location.
Example 42 includes the subject matter of Example 41, and further including means for pausing operation of a plurality of logic gates of the FPGA prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and means for resuming operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein the means for resuming operation of the plurality of logic gates comprises means for resuming operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
Example 43 includes the subject matter of any of Examples 41 and 42, and wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.

Claims (25)

  1. A compute device to reconfigure a field-programmable gate array (FPGA) , the compute device comprising:
    one or more processors;
    one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, causes the compute device to:
    generate a residual bitstream based on a new bitstream for the FPGA and a current bitstream of a current configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream;
    generate a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and the current bitstream and indicates values of the new bitstream at each of the plurality of bitstream locations; and
    load the residual codestream into the FPGA.
  2. The compute device of claim 1, wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to:
    generate a compression codestream corresponding to the new bitstream;
    compare a length of the compression codestream to a length of the residual codestream; and
    determine that the length of the residual codestream is less than the length of the compression codestream.
  3. The compute device of claim 2, wherein to generate the residual bitstream comprises to perform a bit-wise exclusive or operation on the new bitstream and the current bitstream.
  4. The compute device of claim 1, wherein the plurality of instructions further cause the compute device to:
    acquire a new program, wherein the new program comprises a plurality of instructions; and
    compile, based on the current bitstream, the new program to generate the new bitstream.
  5. The compute device of claim 1, wherein the bitstream is a partial bitstream.
  6. The compute device of claim 1, wherein the bitstream is a complete bitstream.
  7. The compute device of claim 1, wherein the plurality of instructions, when executed by the one or more processors, further causes the compute device to:
    acquire a plurality of bitstreams;
    acquire a new program, wherein the new program comprises a plurality of instructions;
    compile, for each of the bitstreams of the plurality of bitstreams and based on the corresponding bitstream of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program; and
    storing the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
  8. The compute device of claim 7, wherein the plurality of bitstreams includes the current bitstream, wherein the plurality of instructions further cause the compute device to retrieve the bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
  9. The compute device of claim 1, further comprising the FPGA, wherein the FPGA is on a multi-chip package with the one or more processors.
  10. The compute device of claim 1, further comprising the FPGA, wherein the FPGA is on a system-on-a-chip with the one or more processors.
  11. A field programmable gate array (FPGA) comprising:
    a bitstream receiver to receive a residual codestream of a new bitstream for loading into the FPGA, wherein the residual codestream indicates a plurality of bitstream  locations which are different between the new bitstream and a current bitstream of a current configuration of the FPGA and indicates values of the new bitstream at each of the plurality of locations;
    a bitstream decoder to decode the residual codestream; and
    a bitstream loader to write, for each bitstream location of the plurality of bitstream locations, a value of the new bitstream at the corresponding bitstream location to a corresponding FPGA memory location.
  12. The FPGA of claim 11, further comprising a plurality of logic gates, wherein the bitstream loader is further to:
    pause operation of the plurality of logic gates prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and
    resume operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein to resume operation of the plurality of logic gates comprises to resume operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
  13. The FPGA of claim 12, further comprising a plurality of logic gates, wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
  14. A method for reconfiguring a field-programmable gate array (FPGA) with a compute device, the method comprising:
    generating, by the compute device, a residual bitstream based on a new bitstream and a current bitstream of acurrent configuration of the FPGA, wherein the residual bitstream indicates differences between the new bitstream and the current bitstream;
    generating, by the compute device, a residual codestream corresponding to the residual bitstream, wherein the residual codestream indicates a plurality of bitstream locations  which are different between the new bitstream and the current bitstream and indicates values of the new bitstream at each of the plurality of bitstream locations; and
    loading, by the compute device, the residual codestream into the FPGA.
  15. The method of claim 14, further comprising:
    generating, by the compute device, a compression codestream corresponding to the new bitstream;
    comparing, by the compute device, a length of the compression codestream to a length of the residual codestream; and
    determining, by the compute device, that the length of the residual codestream is less than the length of the compression codestream.
  16. The method of claim 15, wherein generating the residual bitstream comprises performing a bit-wise exclusive or operation on the new bitstream and the current bitstream.
  17. The method of claim 14, further comprising:
    acquiring a new program, wherein the new program comprises a plurality of instructions; and
    compiling, based on the current bitstream, the new program to generate the new bitstream.
  18. The method of claim 14, wherein the bitstream is a partial bitstream.
  19. The method of claim 14, further comprising:
    acquiring, by the compute device, a plurality of bitstreams;
    acquiring, by the compute device, a new program, wherein the new program comprises a plurality of instructions;
    compiling, by the compute device and for each of the bitstreams of the plurality of bitstreams, the new program into a corresponding bitstream version of a plurality of bitstream versions corresponding to the new program based on the corresponding bitstream of the plurality of bitstreams; and
    storing, by the compute device, the plurality of bitstream versions corresponding to the new program in data storage of the compute device.
  20. The method of claim 17, wherein the plurality of bitstreams includes the current bitstream, further comprising retrieving the bitstream version of the plurality of bitstream versions that was compiled based on the current bitstream.
  21. A method for reconfiguring a field programmable gate array (FPGA) , the method comprising:
    receiving, by the FPGA, a residual codestream of a new bitstream for loading into the FPGA, wherein the residual codestream indicates a plurality of bitstream locations which are different between the new bitstream and a current bitstream of a current configuration of the FPGA and indicates values of the new bitstream at each of the plurality of locations;
    decoding, by the FPGA, the residual codestream; and
    writing, by the FPGA and for each bitstream location of the plurality of bitstream locations, a value of the new bitstream at the corresponding bitstream location to a corresponding FPGA memory location.
  22. The method of claim 21, further comprising:
    pausing operation of a plurality of logic gates of the FPGA prior to writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location; and
    resuming operation of the plurality of logic gates after writing, for each bitstream location of the plurality of bitstream locations, the value of the new bitstream at the corresponding bitstream location to the corresponding FPGA memory location, wherein resuming operation of the plurality of logic gates comprises resuming operation of the plurality of logic gates without waiting for any other write operation to FPGA memory locations associated with the plurality of logic gates.
  23. The method of claim 22, wherein the residual codestream does not include values of the new bitstream at locations other than the plurality of locations.
  24. One or more computer-readable media comprising a plurality of instructions stored thereon that, when executed, causes a compute device to perform the method of any of claims 14-23.
  25. A compute device comprising means to perform the method of any of claims 14-23.
PCT/CN2017/090995 2017-06-30 2017-06-30 Technologies for rapid configuration of field-programmable gate arrays WO2019000362A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/090995 WO2019000362A1 (en) 2017-06-30 2017-06-30 Technologies for rapid configuration of field-programmable gate arrays

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/090995 WO2019000362A1 (en) 2017-06-30 2017-06-30 Technologies for rapid configuration of field-programmable gate arrays

Publications (1)

Publication Number Publication Date
WO2019000362A1 true WO2019000362A1 (en) 2019-01-03

Family

ID=64740298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090995 WO2019000362A1 (en) 2017-06-30 2017-06-30 Technologies for rapid configuration of field-programmable gate arrays

Country Status (1)

Country Link
WO (1) WO2019000362A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274199A (en) * 2020-01-23 2020-06-12 中国科学院微电子研究所 FPGA (field programmable Gate array) on-track reconstruction implementation method for injecting directional modification on difference
CN111309668A (en) * 2020-01-23 2020-06-19 中国科学院微电子研究所 On-track reconstruction implementation method of differentially injected erasing-free FPGA (field programmable Gate array)
WO2021052348A1 (en) * 2019-09-19 2021-03-25 Mediatek Inc. Method and apparatus of residual coding selection for lossless coding mode in video coding
CN113656344A (en) * 2021-08-19 2021-11-16 无锡中微亿芯有限公司 FPGA (field programmable Gate array) for realizing multi-code stream function by using configuration shift chain

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082994A1 (en) * 2009-10-06 2011-04-07 Utah State University Accelerated relocation circuit
US8352898B1 (en) * 2011-05-09 2013-01-08 Xilinx, Inc. Configurations for circuit designs
CN104636151A (en) * 2013-11-06 2015-05-20 京微雅格(北京)科技有限公司 FPGA chip configuration structure and configuration method based on application memorizers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082994A1 (en) * 2009-10-06 2011-04-07 Utah State University Accelerated relocation circuit
US8352898B1 (en) * 2011-05-09 2013-01-08 Xilinx, Inc. Configurations for circuit designs
CN104636151A (en) * 2013-11-06 2015-05-20 京微雅格(北京)科技有限公司 FPGA chip configuration structure and configuration method based on application memorizers

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021052348A1 (en) * 2019-09-19 2021-03-25 Mediatek Inc. Method and apparatus of residual coding selection for lossless coding mode in video coding
US11949852B2 (en) 2019-09-19 2024-04-02 Hfi Innovation Inc. Method and apparatus of residual coding selection for lossless coding mode in video coding
CN111274199A (en) * 2020-01-23 2020-06-12 中国科学院微电子研究所 FPGA (field programmable Gate array) on-track reconstruction implementation method for injecting directional modification on difference
CN111309668A (en) * 2020-01-23 2020-06-19 中国科学院微电子研究所 On-track reconstruction implementation method of differentially injected erasing-free FPGA (field programmable Gate array)
CN113656344A (en) * 2021-08-19 2021-11-16 无锡中微亿芯有限公司 FPGA (field programmable Gate array) for realizing multi-code stream function by using configuration shift chain
CN113656344B (en) * 2021-08-19 2023-08-15 无锡中微亿芯有限公司 FPGA for realizing multi-code stream function by using configuration shift chain

Similar Documents

Publication Publication Date Title
US9859918B1 (en) Technologies for performing speculative decompression
WO2019000362A1 (en) Technologies for rapid configuration of field-programmable gate arrays
KR101825244B1 (en) Solution for full speed, parallel dut testing
US9582426B2 (en) Hardware managed compressed cache
US8869119B2 (en) Preferential execution of method calls in hybrid systems
JP7356887B2 (en) Error correction code accelerator and system
US8806446B2 (en) Methods and apparatus for debugging programs in shared memory
US20160283539A1 (en) Methods for In-Place Access of Serialized Data
US8898345B2 (en) Out-of-band management of third party adapter configuration settings in a computing system
US11070230B2 (en) Run-length base-delta encoding for high-speed compression
US8473699B2 (en) Facilitating data compression during replication using a compressible configuration bit
US9158513B2 (en) Preprocessing kernel print commands
TW201526002A (en) Techniques for storing data in bandwidth optimized or coding rate optimized code words based on data access frequency
US9626127B2 (en) Integrated circuit device, data storage array system and method therefor
US20220121359A1 (en) System and method to utilize a composite block of data during compression of data blocks of fixed size
US10268537B2 (en) Initializing a pseudo-dynamic data compression system with predetermined history data typical of actual data
US9594574B2 (en) Selecting output destinations for kernel messages
US10664448B2 (en) Streamlined padding of deduplication repository file systems
US20180189148A1 (en) Multiple data protection schemes for a single namespace
US11144288B1 (en) System and method for compiling rules set into bytecode using switch and class hierarchies
US11507274B2 (en) System and method to use dictionaries in LZ4 block format compression
US20230103723A1 (en) Method and apparatus for programming data into flash memory
US11972150B2 (en) Method and non-transitory computer-readable storage medium and apparatus for programming data into flash memory through dedicated acceleration hardware
US20230342206A1 (en) Hardware-based generation of uncompressed data blocks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17915305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17915305

Country of ref document: EP

Kind code of ref document: A1