WO2009062496A1 - Rekonfiguri erbare fliesskomma- und bit- ebenen datenverarbeitungseinheit - Google Patents
Rekonfiguri erbare fliesskomma- und bit- ebenen datenverarbeitungseinheit Download PDFInfo
- Publication number
- WO2009062496A1 WO2009062496A1 PCT/DE2008/001892 DE2008001892W WO2009062496A1 WO 2009062496 A1 WO2009062496 A1 WO 2009062496A1 DE 2008001892 W DE2008001892 W DE 2008001892W WO 2009062496 A1 WO2009062496 A1 WO 2009062496A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- point
- floating
- bit
- data processing
- alu
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
Definitions
- the present invention relates to data processing and in particular, but not exclusively, to a reconfigurable data processing unit having an extension according to the invention for the accelerated processing of floating point numbers as well as methods for data processing and / or bit data.
- VPU Building blocks
- the elements may include arithmetic logic units, FPGA areas, input-output cells, memory cells, analog boards, and so on. Building blocks of this type are known, for example, under the name VPU.
- This typically includes arithmetic and / or logical and / or analog and / or storage and / or networking assemblies referred to as PAEs and / or communicative peripheral assemblies (10) that communicate with each other directly or through one or more bus systems are connected.
- PAEs are arranged in any configuration, mixture and hierarchy,
- PAE array or PA for short. It can be assigned to the PAE array or parts thereof a configuring unit.
- VPU components, systolic arrays, neural networks, multiprocessor systems, processors with multiple arithmetic units and / or logic cells, networking and network components such as crossbar, etc., as well as FPGAs, DPGAs, transputers, etc.
- the elements according to the invention, ie, the described floating-point arrangements are readily integrable eg into Xilinx components of the younger Virtex family and / or into other FPGAs or DSPs or processors.
- FIG. 1 shows, by way of example, the structure of a reconfigurable data processing unit.
- a reconfigurable data processing unit may be, for example, an FPGA (eg XI-LINX Virtex, ALTERA), or a reconfigurable processor (eg PACT XPP, AMBRIC, MATHSTAR, STRETCH) or processor (eg STRETCHPROCESSOR, CRADLE, CLEARSPEED, INTEL, AMD, ARM) or be built on the basis of or connected to this.
- Reconfigurable preferably coarse granular and / or mixed coarse / fine granular data processing cells (0101) are arranged in a 2-dimensional or multidimensional array (0103).
- there are memory cells (0102) in the array in one possible embodiment at the edges. Each cell individually, or even groups of cells in common, are preferably configurable in their function at runtime. It is particularly advantageous if the configuration and / or reconfiguration at runtime without affecting not to be reconfigured cells.
- the cells are interconnected via a network (0104), the network is preferably also at runtime in its connection structure and / or topology freely configurable and / or reconfigurable. In this case, it may be advantageous if the configuration and / or reconfiguration takes place at runtime without influencing network segments that are not to be reconfigured.
- the reconfigurable processor exchanges data and / or addresses with I / O devices (0105), the address generators, FIFOs, caches, and the like, with peripherals and / or memory.
- FIG. 2 shows, by way of example, the structure of a reconfigurable cell which, for example, is a coarse-grained data processing cell (0101) or memory cell (0102) or logic processing cell (eg LUT-based CLB, as in the FPGA
- the cell has connections to the network (0104) such that there is a unit for tapping operands from the network (0104a) and a unit for asserting the results on the network (0104b).
- the cells are cascaded horizontally and / or vertically, thereby causing the bus overhead device (0104b) of an overhead cell to connect to the bus of the bus tapping unit (0104a) of an underlying cell.
- At the core (0201) of the cell is a unit that can be designed differently depending on the cell function, e.g. as a coarse granular arithmetic unit, as a memory, as a logic unit (FPGA), or as a permanently implemented ASIC.
- a coarse granular arithmetic unit e.g. as a memory, as a logic unit (FPGA), or as a permanently implemented ASIC.
- FPGA logic unit
- ALU processor-like arithmetic unit
- At least the core (0201) is assigned a control unit (0204) which controls the processing of the data processing (0205) and / or status information (TRIGGER), such as carryover (CARRY), sign (NEGATIVE), comparison values (ZERO , GREATER, LESS, EQUAL) and / or passes it on to the kernel for calculation (0205) and / or receives it (0205).
- the control unit (0204) can access TRIGGER from the network and / or connect to the network.
- units for data transmission from the overlying network to the underlying network (0202) or in the opposite direction (0203) are provided, preferably laterally.
- the preferred lateral units apart from data forwarding means, there are also data processing means, eg arithmetic operations (ALU operations such as addition, subtraction, shift) and / or data link operations such as multiplexing, demultiplexing, merging, swapping, sorting allows the data streams transmitted by the units.
- ALU operations such as addition, subtraction, shift
- / or data link operations such as multiplexing, demultiplexing, merging, swapping, sorting allows the data streams transmitted by the units.
- the two units are preferably designed such that, in addition to their DATA processing functions, they enable the forwarding of TRIGGERS as well as their processing, for example by means of FPGA-like look-up tables (LUTs).
- the core with its associated network ports is also referred to as CORE.
- the lateral units with their associated network connections are also referred to as FREG in data transmission from top to bottom or as BREG in data transmission from bottom to top.
- a cell consisting of CORE, FREG and BREG is called a PAE (Processing Array Element).
- the CORE has, for example, an arithmetic unit (ALU), it is an ALU-PAE.
- ALU arithmetic unit
- RAM CORE memory
- Any further CORE implementations are possible, in particular FPGA-like Logic Processing Units (LP), e.g. in LP-PAEs.
- the network is used to synchronize the exchange of DATA and / or TRIGGERS with synchronization means configured, eg handshake lines, trigger signal transmissions, particularly preferably maskable trigger vector signal transmissions etc.
- synchronization means configured, eg handshake lines, trigger signal transmissions, particularly preferably maskable trigger vector signal transmissions etc.
- Reconfigurable cells are either designed to process individual signals (bits) FPGA-like look-up tables (LUTs) and / or have coarse granular arithmetic units which typically calculate integer values (fixed-point numbers) whose width is typically ranging from 4 to 48-bit.
- Complex calculation of floating-point numbers is not supported by these cells, but can be calculated by configuring a large number of cells to be interconnected.
- the configured interconnection of the cells is extremely inefficient, since a large number of cells are needed and much data has to be transmitted over the network. This leads to an increase in power consumption and a significantly reduced performance in the calculation of floating point numbers due to the inefficient interconnection of many cells.
- the bus systems would have to be adapted to the width of the floating-point numbers, but this proves to be extremely inefficient in the typically more frequent calculation of fixed-point numbers.
- the following describes an arrangement which, inter alia, enables a more efficient use of the bus systems.
- the present invention describes the implementation of optimized, resource and power efficient floating point processing.
- the object of the invention is to create new for commercial use.
- FIG. 3 shows, by way of example, the structure according to the invention, which here is composed of the four ALU-PAEs (ALU-PAE1,..., ALU-PAE4), whereby here each ALU-PAE again consists of FREG, BREG, and CORE ( ⁇ FREG1, BREG1, CORE1 ⁇ , ⁇ FREG2, BREG2, CORE2 ⁇ , ).
- ALU-PAE ALU-PAE1
- BREG2 a structure according to the invention
- each ALU-PAE again consists of FREG, BREG, and CORE ( ⁇ FREG1, BREG1, CORE1 ⁇ , ⁇ FREG2, BREG2, CORE2 ⁇ , ).
- the individual data words are 16-bit wide, thus being 16-bit wide buses
- the operands and results of the FREGs, BREGs, and COREs are 16-bit or multiplication results 32-bit.
- the data bus may be wider than the data words in order to transmit synchronization, trigger signals and information, etc.
- a separate synchronization and / or trigger network or lines can be provided and / or circuit means for the construction, eg reconfigurable structure of the same should be mentioned).
- w be the width of a fixed-point number computable in an ALU-PAE (for example, 16-bit).
- ALU PAEs typically have at least two operand inputs A and B.
- the widths of the inputs typically, but not necessarily, correspond to the width of the calculable fixed-point numbers.
- ALU-PAEs fixed-point arithmetic and logic units
- the fixed-point number network implemented for the width of the fixed-point units (ALU-PAEs) can be used unchanged for floating-point numbers, as several of the fixed-point networks Work connections are bundled to a floating point connection.
- a single-precision floating-point calculator (0301) is additionally implemented.
- This additional floating-point calculator does not exist in conventional array elements. It is also not put together by pure configuration from already existing circuits, but instead only circuit elements which are available for the operation of the additional floating-point calculating means are used, but these alone, i. E. Without the dedicated additional hardware of the floating-point calculator, at least not so good for floating-point operations could be used.
- the 0401 uses the inputs of the ALU-PAEl and ALU-PAE2 as the 0-edge input and the outputs of the two ALUs as the result output.
- the floating-point number format (in this example 32-bit) is transmitted via several (in this example 2) combined fixed-point buses (16-bit in this example).
- DOUBLE2 In a second box (DOUBLE2), as described for DOUBLE1, the ALU-PAEs ALU-PAE3 and ALU-PAE4 are combined to form another single-precision floating-point arithmetic unit (0302), ie. provided with another additional floating-point computer.
- DOUBLE2 the ALU-PAEs ALU-PAE3 and ALU-PAE4 are combined to form another single-precision floating-point arithmetic unit (0302), ie. provided with another additional floating-point computer.
- a third box (QUAD) is formed, which consists of the boxes DOUBLEl and DOUBLE2.
- the width of the operand inputs and result outputs is now sufficient to implement within the QUAD a 64-bit double precision floating-point calculator.
- nesting is not mandatory. If it is already known that only and exclusively double-precision arithmetic units are required, the provision of the two single-precision arithmetic units in the individual boxes can optionally also be dispensed with and a double-precision arithmetic unit can be provided directly and exclusively. The reverse also applies. Also, mixed forms are possible within a cell field. It is preferred, inter alia, if floating-point (ie floating point) arithmetic units are provided in rows and / or columns
- FIG. 3 shows only a section of a reconfigurable data processing unit according to FIG. 1.
- the structure shown here can be scaled over the entire data processing unit, ie all the PAEs of the unit are combined according to boxes.
- only a part or parts of a data processing unit can also have the floating point structure according to the invention, this is preferably done column by column, ie PAEs are combined in columns accordingly.
- statemachines are assigned to the floating-point computers is not absolutely necessary, but it is possible. Statemachines, however, are particularly advantageous when itineraries and / or divisions typically require or may be iterations.
- the floating point arithmetic units, or at least part of them will preferably have registers or other memory access possibilities, for example by accessing memory elements in the array in which lookup tables for
- Trigonometric and / or other functions can be stored, namely configurable and / or firmly integrated. Above all, but not only if iterations and / or other, in particular sequential uses of the or a floating-point arithmetic unit are provided, it is also and / or additionally advantageous to provide a feedback of the operand outputs to the operand inputs , It should be mentioned that, if necessary, feedbacks for status signals are also possible.
- FIG. 4 a again shows the arrangement described in FIG. 3 and to that extent clearly preferred, as well as the DOUBLE and QUAD boxes.
- FIG. 4b shows the mapping of the floating-point data formats to the fixed-point formats of the ALU-PAEs. Shown are 4 ALU-PAEs (0410) and their word format of four times 16-bit (0411). Underneath (0411), the word width of two 32-bit floating point numbers is shown, and below that (0412) the word width of a 32-bit floating point number.
- a major problem is the handling of error signals, such as Overflow, underflow, division by zero, and Not a Number (NaN) representations.
- error signals such as Overflow, underflow, division by zero, and Not a Number (NaN) representations.
- an interrupt is typically triggered to indicate the occurrence of a fault.
- the triggering of an interrupt or the determination of the error source can not be carried out without further ado.
- A) The error displays of all floating-point arithmetic units are connected to a line network, which indicates the occurrence of an error to a higher-level unit. This can be done by triggering an interrupt on a higher-level unit that processes the result.
- the query can take place, for example, by means of JTAG; in particular, a debugger software running on the higher-order or external unit can query the error statuses.
- An alternative method is to inject error signals (eg, overflow, underflow, divide by zero, and Not a Number (NaN)) onto the TRIGGER network within the reconfigurable data processing unit.
- the TRIGGER network forwards the error signals to the floating-point arithmetic units that subsequently process the data, which in turn ORD the incoming error signals with errors that occur in their own arithmetic unit and then send them back on the TRIGGER network together with the data.
- an error identifier is also transmitted on the TRIGGER network. This need not be realized for all network connections, it may be sufficient depending on the application, if this forwarding takes place on at least some of the data connections.
- an error state is then output that indicates the correctness or incorrectness of the result. If an erroneous result occurs, an interrupt can now also be generated in the case of a higher-level unit which processes the result and / or the result error status can be determined by a higher-level unit. Neten, the result further processing unit be queried.
- each floating point calculator can store the error state that has occurred, e.g. Overflow
- This memory can be queried by a higher-order unit at any time, but preferably in response to the occurrence of a result marked as faulty.
- This can e.g. in particular, a debugger software running on the higher-level or external unit can query the error messages.
- FIG. 4c shows by way of example the linking of different error states (events) in a floating-point arithmetic unit. Internally occurring errors are combined with the respective incoming error signals of the respective operands (for example A and B) and forwarded with the result.
- SIMD floating-point arithmetic units may be advantageous in order to achieve double or multiple precision -
- floating-point commutation units it may be sufficient if these are designed for multiplication, addition and subtraction, preferably also root formation and division, which however should not exclude the implementation of further functions in more complex arithmetic units and what else should not exclude in that further, in particular comparative, functions such as greater, smaller, equal, greater zero, less than zero, equal to zero, etc., and in particular also format conversion functions, eg double precision, are implemented in integers.
- SIMD capability has been extended, whereby each ALU-PAE can also perform a single precision floating point calculation.
- ALU-PAEs can also be combined with this method in order to allow greater processing width, eg two 32-bit SIMD / single-precision ALUs can be used. PAEs a 64-bit double-precision DOUBLE (formerly QUAD) are formed.
- the floating-point arithmetic units preferably have one or more internal register stages, so-called pipeline stages, which enable the operation of the arithmetic units at high frequencies.
- This is particularly advantageous in data flow architectures such as Applicant's PACT XPP technology, as these architectures typically have little or no pipeline stall.
- the processor model largely avoids loops within a configuration, so there are no feedback effects that adversely affect performance when using pipelines.
- each box multiplexer are provided with which alternatively output signals from the conventional arithmetic units , ie the fixed-point arithmetic units, and the floating-point commutators to a bus or other output element such as a memory, an I / O port and the like. can be switched.
- this multiplexer can either be fed by the integer arithmetic units in a single cell, the single-precision floating point arithmetic unit of a box combining two individual cells, or the double-precision arithmetic unit of a double box. That in addition to data appropriate trigger and / or synchronization and / or control signals are mitmultiplexbar, should be mentioned.
- Another aspect of the invention relates to an efficient unit for processing boolean operations (BPU Bit Processing Unit). Applicatively, for example, the following calculations are of particular importance to the unit: Implementation of state machines (statemachines) Implementation of decoders and encoders Performing permutations on the bit level, e.g. required for DES / 3DES
- serial bit arithmetic such as e.g. Pseudo noise generator
- Coarse granular arithmetic units such as ALUs
- ALUs are poorly suited for the applications mentioned by way of example, since very many calculation steps are necessary for calculating a single bit; at the same time, only a few bits of a broad data word (eg 16-bit), typically only one, actually used.
- the BPU according to the invention should have less of an arbitrary usability for logic networks in its design, but be specially tailored to the following functionality: 1. Construction of state machines 2. Construction of counters and sliders
- a first essential aspect of the present invention is the implementation of hardware elements for performing dense and efficient bit-serial operations.
- conditional multiplexer to support directly in hardware another essential aspect of the invention is seen.
- any combinatorial network can be built on multiplexers.
- Hardware design languages such as Verilog or VHDL are essentially based on the use of conditional multiplexing operations, which are then transferred from synthesis tools to gate netlists.
- the HDL is also more optimally writable by the programmer because he has a simple and basic understanding of the underlying hardware and thus his Code that can significantly improve the arithmetic / architecture and implementation.
- state-of-the-art synthesis tools usually provide rather good automatic optimization techniques, they often fail at particularly critical and relevant codes; At the same time, the synthesis tools take every direct influence on the hardware, so that an optimal implementation is often hardly possible.
- logic processing units now have a comparator which evaluates the logical truth or the logical value of bool_funcl. This is preferably done via a common fast comparator, e.g. composed of linked XOR gates.
- the multiplexers may be 1-bit wide or several-bit wide, preferably the hardware-imparting will allow an optimized mixture.
- the hardware implementation will provide means to enable simple logical operations (Boolean functions) in front of the multiplexer. For example, can Before each multiplexer input, a 2-fold look-up table can be implemented, which allows the arbitrary Boolean combination of two input signals, or the direct forwarding of only one of the signals.
- FIG. 6 shows a further variant of a BPU according to the invention. Shown is a 4x4 patch of a configurable logic array (Field Programmable Gate Array, FPGA). Each gate is based on a 3-Input to 3 -Output Lookup Table (LUT, 0601), which calculates an independent lookup function for each of the three outputs, each based on all 3 inputs.
- LUT 3-Input to 3 -Output Lookup Table
- the individual cells do not have a register function, but a set of cells (in this example, a 4x4 matrix) are assigned registers at the edges (in this example, at the
- Each output of the edges LUTs (0602) is assigned a register (0603) configurable, which can either be switched on to forward the output signal with delay or can be bypassed by means of a multiplexer function, which corresponds to an instantaneous forwarding of the output signal.
- the LUTs receive the input signals from a higher-level bus system configurable via multiplexer (0604).
- feedback of the register values (f [0..2] [0..2]) to the LUT inputs is also possible via the multiplexer (0604).
- the 4x4 matrix shown is freely cascadable, allowing large configurable logic fields to be built.
- An essential aspect of the BPU according to the invention is the improved prediction of the timing and the protection against so-called undelayed feedback loops, which can lead to physical destruction of the circuit due to an asynchronous feedback.
- the following rule is implemented for this: Data is only passed through the logic field in one of the cardinal directions north-south and one of the cardinal points east-west.
- the main signal direction is in a column from north to south, for transmissions signals can be transmitted within a row from west to east. Diagonal signal transmission is possible in the north-south direction.
- FIG. 7 shows an integration example of the inventive BPU according to FIG. 6 into the VPU architecture of the Erinderin.
- the circuit has a bus input interface (0701) which receives data and / or triggers from a configurable bus system.
- a bus output interface (0702) switches the signals generated by the one logic array (0703) to data and / or trigger buses.
- Logic field 0703 comprises a plurality of BPUs of FIG. 6 arranged in tiled multidimensional fashion. The arrows illustrate the directions of travel of the signals within the logic array, as described in FIG. 6.
- the bus interfaces and the logic field are assigned a freely programmable state machine (0704), which takes over the sequence control of the bus transfers and / or generation of control signals and / or synchronization tasks.
- a freely programmable state machine (0704), which takes over the sequence control of the bus transfers and / or generation of control signals and / or synchronization tasks.
- the VPU technology has handshake protocols for the automatic synchronization of data and / or trigger transmissions.
- the state machine (0704) also manages, in particular, the handshakes (RDY / ACK) of the bus protocols of the input and / or output bus.
- the signals from the bus input interface (0701) and / or bus output interface (0702) are sent to the state machine for control, which generates control signals for controlling the data transmissions for the corresponding interface. Furthermore, the state machine receives signals from the logic field (0703) in order to be able to react to its internal states. Conversely, the state machine can transmit control signals to the logic field.
- the state machine can preferably be programmed over a wide range in order to ensure maximum flexibility for the use of the logic field.
- parts of the state machine are functionally critically implemented, such as e.g. the handshake protocols of buses. This ensures that the basic functionality of a BPU at the system level is ensured. All bus transfers run at system level by definition by the fixed part of the state machine correctly. This greatly facilitates programming and debugging at the system level.
- This permanently implemented part of the state machine (0704) is assigned the freely programmable part in which the program Depending on the application, the programmer can implement the control of the logic field.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010533431A JP2011503733A (ja) | 2007-11-17 | 2008-11-17 | リコンフィギュラブルな浮動小数点レベルおよびビットレベルのデータ処理ユニット |
EP08850421A EP2220554A1 (de) | 2007-11-17 | 2008-11-17 | Rekonfiguri erbare fliesskomma- und bit- ebenen datenverarbeitungseinheit |
DE112008003643T DE112008003643A5 (de) | 2007-11-17 | 2008-11-17 | Rekonfigurierbare Fliesskomma- und Bit- ebenen Datenverarbeitungseinheit |
US12/743,356 US20100281235A1 (en) | 2007-11-17 | 2008-11-17 | Reconfigurable floating-point and bit-level data processing unit |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102007055131 | 2007-11-17 | ||
DE102007055131.4 | 2007-11-17 | ||
DE102007056806 | 2007-11-23 | ||
DE102007056806.3 | 2007-11-23 | ||
DE102008014705.2 | 2008-03-18 | ||
DE102008014705 | 2008-03-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009062496A1 true WO2009062496A1 (de) | 2009-05-22 |
Family
ID=40384208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2008/001892 WO2009062496A1 (de) | 2007-11-17 | 2008-11-17 | Rekonfiguri erbare fliesskomma- und bit- ebenen datenverarbeitungseinheit |
Country Status (5)
Country | Link |
---|---|
US (1) | US20100281235A1 (de) |
EP (1) | EP2220554A1 (de) |
JP (1) | JP2011503733A (de) |
DE (1) | DE112008003643A5 (de) |
WO (1) | WO2009062496A1 (de) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9465578B2 (en) * | 2013-12-13 | 2016-10-11 | Nvidia Corporation | Logic circuitry configurable to perform 32-bit or dual 16-bit floating-point operations |
US10409614B2 (en) | 2017-04-24 | 2019-09-10 | Intel Corporation | Instructions having support for floating point and integer data types in the same register |
US10474458B2 (en) | 2017-04-28 | 2019-11-12 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
CN112905240A (zh) | 2019-03-15 | 2021-06-04 | 英特尔公司 | 用于脉动阵列上的块稀疏操作的架构 |
US20220114096A1 (en) | 2019-03-15 | 2022-04-14 | Intel Corporation | Multi-tile Memory Management for Detecting Cross Tile Access Providing Multi-Tile Inference Scaling and Providing Page Migration |
Family Cites Families (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US56062A (en) * | 1866-07-03 | Improved machine for making nuts | ||
US36988A (en) * | 1862-11-25 | Improvement in the refining and manufacture of sugar | ||
US3564506A (en) * | 1968-01-17 | 1971-02-16 | Ibm | Instruction retry byte counter |
US3753008A (en) * | 1970-06-20 | 1973-08-14 | Honeywell Inf Systems | Memory pre-driver circuit |
US5459846A (en) * | 1988-12-02 | 1995-10-17 | Hyatt; Gilbert P. | Computer architecture system having an imporved memory |
US3754211A (en) * | 1971-12-30 | 1973-08-21 | Ibm | Fast error recovery communication controller |
US3956589A (en) * | 1973-11-26 | 1976-05-11 | Paradyne Corporation | Data telecommunication system |
DE2713648A1 (de) * | 1976-03-26 | 1977-10-06 | Tokyo Shibaura Electric Co | Stromzufuhr-steuervorrichtung fuer speichervorrichtungen |
US4594682A (en) * | 1982-12-22 | 1986-06-10 | Ibm Corporation | Vector processing |
US4646300A (en) * | 1983-11-14 | 1987-02-24 | Tandem Computers Incorporated | Communications method |
US4748580A (en) * | 1985-08-30 | 1988-05-31 | Advanced Micro Devices, Inc. | Multi-precision fixed/floating-point processor |
US5070475A (en) * | 1985-11-14 | 1991-12-03 | Data General Corporation | Floating point unit interface |
US4760525A (en) * | 1986-06-10 | 1988-07-26 | The United States Of America As Represented By The Secretary Of The Air Force | Complex arithmetic vector processor for performing control function, scalar operation, and set-up of vector signal processing instruction |
US5119290A (en) * | 1987-10-02 | 1992-06-02 | Sun Microsystems, Inc. | Alias address support |
CA1286421C (en) * | 1987-10-14 | 1991-07-16 | Martin Claude Lefebvre | Message fifo buffer controller |
US5081575A (en) * | 1987-11-06 | 1992-01-14 | Oryx Corporation | Highly parallel computer architecture employing crossbar switch with selectable pipeline delay |
US5031179A (en) * | 1987-11-10 | 1991-07-09 | Canon Kabushiki Kaisha | Data communication apparatus |
NL8800071A (nl) * | 1988-01-13 | 1989-08-01 | Philips Nv | Dataprocessorsysteem en videoprocessorsysteem, voorzien van een dergelijk dataprocessorsysteem. |
US4939641A (en) * | 1988-06-30 | 1990-07-03 | Wang Laboratories, Inc. | Multi-processor system with cache memories |
US5245616A (en) * | 1989-02-24 | 1993-09-14 | Rosemount Inc. | Technique for acknowledging packets |
WO1991011765A1 (en) * | 1990-01-29 | 1991-08-08 | Teraplex, Inc. | Architecture for minimal instruction set computing system |
US5036493A (en) * | 1990-03-15 | 1991-07-30 | Digital Equipment Corporation | System and method for reducing power usage by multiple memory modules |
CA2045773A1 (en) * | 1990-06-29 | 1991-12-30 | Compaq Computer Corporation | Byte-compare operation for high-performance processor |
JPH04328657A (ja) * | 1991-04-30 | 1992-11-17 | Toshiba Corp | キャッシュメモリ |
JP2572522B2 (ja) * | 1992-05-12 | 1997-01-16 | インターナショナル・ビジネス・マシーンズ・コーポレイション | コンピューティング装置 |
US5339840A (en) * | 1993-04-26 | 1994-08-23 | Sunbelt Precision Products Inc. | Adjustable comb |
US5435000A (en) * | 1993-05-19 | 1995-07-18 | Bull Hn Information Systems Inc. | Central processing unit using dual basic processing units and combined result bus |
US5581734A (en) * | 1993-08-02 | 1996-12-03 | International Business Machines Corporation | Multiprocessor system with shared cache and data input/output circuitry for transferring data amount greater than system bus capacity |
US5502838A (en) * | 1994-04-28 | 1996-03-26 | Consilium Overseas Limited | Temperature management for integrated circuits |
US6064819A (en) * | 1993-12-08 | 2000-05-16 | Imec | Control flow and memory management optimization |
US5677909A (en) * | 1994-05-11 | 1997-10-14 | Spectrix Corporation | Apparatus for exchanging data between a central station and a plurality of wireless remote stations on a time divided commnication channel |
US6217234B1 (en) * | 1994-07-29 | 2001-04-17 | Discovision Associates | Apparatus and method for processing data with an arithmetic unit |
US5584013A (en) * | 1994-12-09 | 1996-12-10 | International Business Machines Corporation | Hierarchical cache arrangement wherein the replacement of an LRU entry in a second level cache is prevented when the cache entry is the only inclusive entry in the first level cache |
US5603005A (en) * | 1994-12-27 | 1997-02-11 | Unisys Corporation | Cache coherency scheme for XBAR storage structure with delayed invalidates until associated write request is executed |
JP3598139B2 (ja) * | 1994-12-28 | 2004-12-08 | 株式会社日立製作所 | データ処理装置 |
US5682491A (en) * | 1994-12-29 | 1997-10-28 | International Business Machines Corporation | Selective processing and routing of results among processors controlled by decoding instructions using mask value derived from instruction tag and processor identifier |
US5778237A (en) * | 1995-01-10 | 1998-07-07 | Hitachi, Ltd. | Data processor and single-chip microcomputer with changing clock frequency and operating voltage |
ZA965340B (en) * | 1995-06-30 | 1997-01-27 | Interdigital Tech Corp | Code division multiple access (cdma) communication system |
US5784313A (en) * | 1995-08-18 | 1998-07-21 | Xilinx, Inc. | Programmable logic device including configuration data or user data memory slices |
CA2166369C (en) * | 1995-12-29 | 2004-10-19 | Robert J. Blainey | Method and system for determining inter-compilation unit alias information |
US5898602A (en) * | 1996-01-25 | 1999-04-27 | Xilinx, Inc. | Carry chain circuit with flexible carry function for implementing arithmetic and logical functions |
US5727229A (en) * | 1996-02-05 | 1998-03-10 | Motorola, Inc. | Method and apparatus for moving data in a parallel processor |
JP3934710B2 (ja) * | 1996-09-13 | 2007-06-20 | 株式会社ルネサステクノロジ | マイクロプロセッサ |
US5832288A (en) * | 1996-10-18 | 1998-11-03 | Samsung Electronics Co., Ltd. | Element-select mechanism for a vector processor |
US5895487A (en) * | 1996-11-13 | 1999-04-20 | International Business Machines Corporation | Integrated processing and L2 DRAM cache |
US5913925A (en) * | 1996-12-16 | 1999-06-22 | International Business Machines Corporation | Method and system for constructing a program including out-of-order threads and processor and method for executing threads out-of-order |
GB2323188B (en) * | 1997-03-14 | 2002-02-06 | Nokia Mobile Phones Ltd | Enabling and disabling clocking signals to elements |
US5996048A (en) * | 1997-06-20 | 1999-11-30 | Sun Microsystems, Inc. | Inclusion vector architecture for a level two cache |
US6058266A (en) * | 1997-06-24 | 2000-05-02 | International Business Machines Corporation | Method of, system for, and computer program product for performing weighted loop fusion by an optimizing compiler |
US6072348A (en) * | 1997-07-09 | 2000-06-06 | Xilinx, Inc. | Programmable power reduction in a clock-distribution circuit |
US6026478A (en) * | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
US6078736A (en) * | 1997-08-28 | 2000-06-20 | Xilinx, Inc. | Method of designing FPGAs for dynamically reconfigurable computing |
US6212544B1 (en) * | 1997-10-23 | 2001-04-03 | International Business Machines Corporation | Altering thread priorities in a multithreaded processor |
JPH11147335A (ja) * | 1997-11-18 | 1999-06-02 | Fuji Xerox Co Ltd | 描画処理装置 |
US6075935A (en) * | 1997-12-01 | 2000-06-13 | Improv Systems, Inc. | Method of generating application specific integrated circuits using a programmable hardware architecture |
US6260114B1 (en) * | 1997-12-30 | 2001-07-10 | Mcmz Technology Innovations, Llc | Computer cache memory windowing |
US6096091A (en) * | 1998-02-24 | 2000-08-01 | Advanced Micro Devices, Inc. | Dynamically reconfigurable logic networks interconnected by fall-through FIFOs for flexible pipeline processing in a system-on-a-chip |
US6298043B1 (en) * | 1998-03-28 | 2001-10-02 | Nortel Networks Limited | Communication system architecture and a connection verification mechanism therefor |
US6456628B1 (en) * | 1998-04-17 | 2002-09-24 | Intelect Communications, Inc. | DSP intercommunication network |
US6173419B1 (en) * | 1998-05-14 | 2001-01-09 | Advanced Technology Materials, Inc. | Field programmable gate array (FPGA) emulator for debugging software |
US6052524A (en) * | 1998-05-14 | 2000-04-18 | Software Development Systems, Inc. | System and method for simulation of integrated hardware and software components |
US6449283B1 (en) * | 1998-05-15 | 2002-09-10 | Polytechnic University | Methods and apparatus for providing a fast ring reservation arbitration |
WO2001006371A1 (en) * | 1998-07-21 | 2001-01-25 | Seagate Technology Llc | Improved memory system apparatus and method |
US6289369B1 (en) * | 1998-08-25 | 2001-09-11 | International Business Machines Corporation | Affinity, locality, and load balancing in scheduling user program-level threads for execution by a computer system |
US20020152060A1 (en) * | 1998-08-31 | 2002-10-17 | Tseng Ping-Sheng | Inter-chip communication system |
US7100026B2 (en) * | 2001-05-30 | 2006-08-29 | The Massachusetts Institute Of Technology | System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values |
US6249756B1 (en) * | 1998-12-07 | 2001-06-19 | Compaq Computer Corp. | Hybrid flow control |
US6826763B1 (en) * | 1998-12-11 | 2004-11-30 | Microsoft Corporation | Accelerating a distributed component architecture over a network using a direct marshaling |
US6694434B1 (en) * | 1998-12-23 | 2004-02-17 | Entrust Technologies Limited | Method and apparatus for controlling program execution and program distribution |
US6496902B1 (en) * | 1998-12-31 | 2002-12-17 | Cray Inc. | Vector and scalar data cache for a vector multiprocessor |
US6321298B1 (en) * | 1999-01-25 | 2001-11-20 | International Business Machines Corporation | Full cache coherency across multiple raid controllers |
US6191614B1 (en) * | 1999-04-05 | 2001-02-20 | Xilinx, Inc. | FPGA configuration circuit including bus-based CRC register |
GB9909196D0 (en) * | 1999-04-21 | 1999-06-16 | Texas Instruments Ltd | Transfer controller with hub and ports architecture |
US6624819B1 (en) * | 2000-05-01 | 2003-09-23 | Broadcom Corporation | Method and system for providing a flexible and efficient processor for use in a graphics processing system |
US6845445B2 (en) * | 2000-05-12 | 2005-01-18 | Pts Corporation | Methods and apparatus for power control in a scalable array of processor elements |
US6725334B2 (en) * | 2000-06-09 | 2004-04-20 | Hewlett-Packard Development Company, L.P. | Method and system for exclusive two-level caching in a chip-multiprocessor |
US7164422B1 (en) * | 2000-07-28 | 2007-01-16 | Ab Initio Software Corporation | Parameterized graphs with conditional components |
EP1182559B1 (de) * | 2000-08-21 | 2009-01-21 | Texas Instruments Incorporated | Mikroprozessor |
JP2002123563A (ja) * | 2000-10-13 | 2002-04-26 | Nec Corp | コンパイル方法および合成装置ならびに記録媒体 |
US20020099759A1 (en) * | 2001-01-24 | 2002-07-25 | Gootherts Paul David | Load balancer with starvation avoidance |
US6836849B2 (en) * | 2001-04-05 | 2004-12-28 | International Business Machines Corporation | Method and apparatus for controlling power and performance in a multiprocessing system according to customer level operational requirements |
AU2002347560A1 (en) * | 2001-06-20 | 2003-01-02 | Pact Xpp Technologies Ag | Data processing method |
US7036114B2 (en) * | 2001-08-17 | 2006-04-25 | Sun Microsystems, Inc. | Method and apparatus for cycle-based computation |
US6625631B2 (en) * | 2001-09-28 | 2003-09-23 | Intel Corporation | Component reduction in montgomery multiplier processing element |
US6668237B1 (en) * | 2002-01-17 | 2003-12-23 | Xilinx, Inc. | Run-time reconfigurable testing of programmable logic devices |
US20030154349A1 (en) * | 2002-01-24 | 2003-08-14 | Berg Stefan G. | Program-directed cache prefetching for media processors |
US20030226056A1 (en) * | 2002-05-28 | 2003-12-04 | Michael Yip | Method and system for a process manager |
AU2003286131A1 (en) * | 2002-08-07 | 2004-03-19 | Pact Xpp Technologies Ag | Method and device for processing data |
WO2005010632A2 (en) * | 2003-06-17 | 2005-02-03 | Pact Xpp Technologies Ag | Data processing device and method |
US6931494B2 (en) * | 2002-09-09 | 2005-08-16 | Broadcom Corporation | System and method for directional prefetching |
US7571303B2 (en) * | 2002-10-16 | 2009-08-04 | Akya (Holdings) Limited | Reconfigurable integrated circuit |
US7155708B2 (en) * | 2002-10-31 | 2006-12-26 | Src Computers, Inc. | Debugging and performance profiling using control-dataflow graph representations with reconfigurable hardware emulation |
US7299458B2 (en) * | 2002-10-31 | 2007-11-20 | Src Computers, Inc. | System and method for converting control flow graph representations to control-dataflow graph representations |
US7412581B2 (en) * | 2003-10-28 | 2008-08-12 | Renesas Technology America, Inc. | Processor for virtual machines and method therefor |
US7299339B2 (en) * | 2004-08-30 | 2007-11-20 | The Boeing Company | Super-reconfigurable fabric architecture (SURFA): a multi-FPGA parallel processing architecture for COTS hybrid computing framework |
US7455450B2 (en) * | 2005-10-07 | 2008-11-25 | Advanced Micro Devices, Inc. | Method and apparatus for temperature sensing in integrated circuits |
US7759968B1 (en) * | 2006-09-27 | 2010-07-20 | Xilinx, Inc. | Method of and system for verifying configuration data |
US8463835B1 (en) * | 2007-09-13 | 2013-06-11 | Xilinx, Inc. | Circuit for and method of providing a floating-point adder |
US20090193384A1 (en) * | 2008-01-25 | 2009-07-30 | Mihai Sima | Shift-enabled reconfigurable device |
-
2008
- 2008-11-17 DE DE112008003643T patent/DE112008003643A5/de not_active Withdrawn
- 2008-11-17 JP JP2010533431A patent/JP2011503733A/ja active Pending
- 2008-11-17 EP EP08850421A patent/EP2220554A1/de not_active Ceased
- 2008-11-17 US US12/743,356 patent/US20100281235A1/en not_active Abandoned
- 2008-11-17 WO PCT/DE2008/001892 patent/WO2009062496A1/de active Application Filing
Non-Patent Citations (2)
Title |
---|
LIBO HUANG ET AL: "A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design", COMPUTER ARITHMETIC, 2007. ARITH '07. 18TH IEEE SYMPOSIUM ON, IEEE, PI, 1 June 2007 (2007-06-01), pages 69 - 76, XP031116327, ISBN: 978-0-7695-2854-0 * |
MANHWEE JO ET AL: "Implementation of floating-point operations for 3D graphics on a coarse-grained reconfigurable architecture", SOC CONFERENCE, 2007 IEEE INTERNATIONAL, IEEE, PISCATAWAY, NJ, USA, 26 September 2007 (2007-09-26), pages 127 - 130, XP031274142, ISBN: 978-1-4244-1592-2 * |
Also Published As
Publication number | Publication date |
---|---|
DE112008003643A5 (de) | 2010-10-28 |
JP2011503733A (ja) | 2011-01-27 |
US20100281235A1 (en) | 2010-11-04 |
EP2220554A1 (de) | 2010-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69827589T2 (de) | Konfigurierbare Verarbeitungsanordnung und Verfahren zur Benutzung dieser Anordnung, um eine Zentraleinheit aufzubauen | |
DE102018005181B4 (de) | Prozessor für einen konfigurierbaren, räumlichen beschleuniger mit leistungs-, richtigkeits- und energiereduktionsmerkmalen | |
DE60100476T2 (de) | Unterstützung mehrerer konfigurationszustände eines fpga mittels eines auf dem integrierten schaltkreis dafür vorgesehenen prozessors | |
EP2220554A1 (de) | Rekonfiguri erbare fliesskomma- und bit- ebenen datenverarbeitungseinheit | |
DE19914210B4 (de) | Verfahren und Prozessor für eine gestaffelte Ausführung einer Anweisung | |
DE19722365B4 (de) | Rekonfigurierbares Rechenbauelement | |
DE60018078T2 (de) | Einstellung von bedingungswerten in einem rechner | |
WO2007082730A1 (de) | Hardwaredefinitionsverfahren | |
DE102020113922A1 (de) | Multipliziererschaltungsanordnung mit reduzierter latenz für sehr grosse zahlen | |
US20080263319A1 (en) | Universal digital block with integrated arithmetic logic unit | |
DE19814415A1 (de) | Logikanalyse-Untersystem in einem Zeitscheibenemulator | |
Chiuchisan | A new FPGA-based real-time configurable system for medical image processing | |
EP1927063B1 (de) | Programmierung und layoutdesign von hardware | |
DE4345029C2 (de) | Schaltkreis für diskrete Kosinustransformation | |
DE112017004291T5 (de) | Integrierte Schaltungen mit spezialisierten Verarbeitungsblöcken zum Durchführen von schnellen Fourier Gleitkommatransformationen und komplexer Multiplikation | |
DE102020131154A1 (de) | Gefährdungsabschwächung für leichte Prozessorkerne | |
DE102017209856A1 (de) | Recheneinheit und Betriebsverfahren hierfür | |
EP1116129B1 (de) | Konfigurierbarer hardware-block | |
Patle et al. | Implementation of Baugh-Wooley Multiplier Based on Soft-Core Processor | |
EP1789889B1 (de) | Rechnereinrichtung mit rekonfigurierbarer architektur zur aufnahme eines globalen zellularen automaten | |
CN209118339U (zh) | 加速器、基于fpga的加速系统及cnn网络系统 | |
DE60003847T2 (de) | Universelle vorrichtung sowie verfahren und werkzeuge zu ihrer verwendung | |
EP3142032B1 (de) | Verfahren zum ändern der konfiguration eines programmierbaren logikbausteins | |
EP2553589B1 (de) | Verfahren zum ansteuern einer anzahl an modulen | |
EP2043000A2 (de) | Bussysteme und Rekonfigurationsverfahren |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08850421 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010533431 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008850421 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12743356 Country of ref document: US |
|
REF | Corresponds to |
Ref document number: 112008003643 Country of ref document: DE Date of ref document: 20101028 Kind code of ref document: P |