EP1518186A2 - Method and device for data processing - Google Patents
Method and device for data processingInfo
- Publication number
- EP1518186A2 EP1518186A2 EP03720231A EP03720231A EP1518186A2 EP 1518186 A2 EP1518186 A2 EP 1518186A2 EP 03720231 A EP03720231 A EP 03720231A EP 03720231 A EP03720231 A EP 03720231A EP 1518186 A2 EP1518186 A2 EP 1518186A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- data processing
- configuration
- processor
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000012545 processing Methods 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims description 93
- 230000008878 coupling Effects 0.000 claims abstract description 27
- 238000010168 coupling process Methods 0.000 claims abstract description 27
- 238000005859 coupling reaction Methods 0.000 claims abstract description 27
- 230000015654 memory Effects 0.000 claims description 68
- 230000008569 process Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 19
- 230000008859 change Effects 0.000 claims description 13
- 230000002093 peripheral effect Effects 0.000 claims description 13
- 230000006399 behavior Effects 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000007667 floating Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 230000001960 triggered effect Effects 0.000 claims description 3
- 230000003936 working memory Effects 0.000 claims 1
- 238000012546 transfer Methods 0.000 description 15
- 230000007246 mechanism Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000006855 networking Effects 0.000 description 4
- 235000019800 disodium phosphate Nutrition 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 241000122205 Chamaeleonidae Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- LHMQDVIHBXWNII-UHFFFAOYSA-N 3-amino-4-methoxy-n-phenylbenzamide Chemical compound C1=C(N)C(OC)=CC=C1C(=O)NC1=CC=CC=C1 LHMQDVIHBXWNII-UHFFFAOYSA-N 0.000 description 1
- LFVLUOAHQIVABZ-UHFFFAOYSA-N Iodofenphos Chemical compound COP(=S)(OC)OC1=CC(Cl)=C(I)C=C1Cl LFVLUOAHQIVABZ-UHFFFAOYSA-N 0.000 description 1
- OFFWOVJBSQMVPI-RMLGOCCBSA-N Kaletra Chemical compound N1([C@@H](C(C)C)C(=O)N[C@H](C[C@H](O)[C@H](CC=2C=CC=CC=2)NC(=O)COC=2C(=CC=CC=2C)C)CC=2C=CC=CC=2)CCCNC1=O.N([C@@H](C(C)C)C(=O)N[C@H](C[C@H](O)[C@H](CC=1C=CC=CC=1)NC(=O)OCC=1SC=NC=1)CC=1C=CC=CC=1)C(=O)N(C)CC1=CSC(C(C)C)=N1 OFFWOVJBSQMVPI-RMLGOCCBSA-N 0.000 description 1
- 241000876466 Varanus bengalensis Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000036316 preload Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
- G06F2212/621—Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
Definitions
- the present invention is concerned with the integration and / or close coupling of reconfigurable processors with standard processors, the data exchange and the synchronization of data processing and compilers therefor.
- a reconfigurable architecture is understood to mean modules (VPU) with configurable function and / or networking, in particular integrated modules with a plurality of arithmetic and / or logical and / or logical and / or analog and / or storing and / or internal / external arranged in one or more dimensions networking modules that are connected to each other directly or through a bus system.
- the category of these modules includes, in particular, systolic arrays, neural networks, honorary processor systems, processors with several arithmetic units and / or logical cells and / or communicative / peripheral cells (10), networking and network modules such as e.g. Crossbar switches, as well as known modules of the FPGA, DPGA, Chameleon, XPUTER, etc. type.
- networking and network modules such as e.g. Crossbar switches, as well as known modules of the FPGA, DPGA, Chameleon, XPUTER, etc. type.
- the above architecture is used as an example for clarification and is referred to below as VPÜ.
- the architecture consists of any, typically coarse-granular arithmetic, logical (also memory) and / or memory cells and / or network cells and / or communicative / peripheral (10) cells (PAEs), which are arranged in a one- or multi-dimensional matrix (PA) can, the matrix can have different cells of any configuration, and the bus systems can also be understood as cells.
- a configuration unit (CT) is assigned to the matrix as a whole or in part, which determines the networking and function of the PA through configuration.
- a fine-grained control logic can be provided.
- the object of the invention is to provide something new for commercial use.
- the solution to the problem is claimed independently.
- Preferred embodiments are in the subclaims.
- a standard processor e.g. a RISC, CISC, DSP (CPÜ) are coupled with a reconfigurable processor (VPÜ).
- CPÜ RISC, CISC, DSP
- VPÜ reconfigurable processor
- a first variant provides for a direct connection to the command set of a CPÜ (command set coupling).
- a second variant provides a connection via tables in the main memory. Both can be implemented simultaneously and / or alternatively.
- ISA instruction set
- VPÜCODE VPÜCODE
- the decoding of a VPÜCODE controls a configuration unit (CT) of a VPÜ that executes certain processes depending on the VPÜCODE.
- CT configuration unit
- a VPÜCODE can trigger the loading and / or execution of configurations by the configuration unit (CT) for a VPÜ. Command transfer to VPU
- a VPÜCODE can be translated to different VPü commands via a translation table, which is preferably built up from the CPU.
- the configuration table can be set depending on the CPU program or code section being executed.
- the VPU loads configurations from its own or a z. B. shared memory with the CPU.
- a configuration can be included in the code of the program currently being executed.
- a VPÜ After receiving an execution command, a VPÜ carries out the configuration to be executed and the corresponding data processing.
- the termination of data processing can be indicated to the CPU by a termination signal (TERM).
- VPUCODE processing on CPU If a VPÜCODE occurs, waiting cycles can be carried out on the CPÜ until the termination signal (TERM) of the end of the data processing arrives from the VPÜ.
- TAM termination signal
- the processing of the next codes is continued. If a further VPÜCODE occurs, the end of the previous VPÜCODE can then be waited for, or all started VPÜCODEs are placed in a processing pipeline, or a task change is carried out as described below.
- the termination of data processing is signaled by the arrival of the termination signal (TERM) in a status register.
- the termination signals arrive in the order of a possible processing pipeline.
- Data processing on the CPU can be synchronized by testing the status register for the arrival of a termination signal.
- TERM e.g. a task change cannot be triggered due to data dependencies.
- loose couplings are preferably set up between processors and VPUs, in which VPÜs work as independent coprocessors for the most part.
- Such a coupling typically provides one or more common data sources and sinks, mostly via common bus systems and / or common memories. Data is exchanged between a CPÜ and a VPÜ via DMAs and / or other memory access controllers.
- the synchronization of the data processing er 7 preferably follows via an interrupt control or a status query mechanism (eg polling).
- a close coupling corresponds to the direct coupling of a VPU into the instruction set of a CPU described above.
- the wave reconfiguration according to DE 198 07 872, DE 199 26-538, DE 100 28 397 can therefore preferably be used.
- the configuration words are preferably preloaded according to DE 196 54 846., DE 199 26 538, DE 100 28 397, DE 102 12 621 in such a way that when the command is executed, the configuration is particularly fast (for example using wave reconfiguration in the best case) can be configured within one cycle).
- the configurations that are likely to be carried out are preferably recognized in advance by the compiler at compile time, ie. H. estimated and / or predicted, and preloaded accordingly at runtime where possible.
- Possible processes are known for example from DE 196 54 846, DE 197 04 728, DE 198 07 872, DE 199 26 538, DE 100 28 397, DE 102 12 621.
- Configurations are particularly preferably preloaded into shadow configuration registers, as is known, for example, from DE 197 04 728 (FIG. 6) and DE 102 12 621 (FIG. 14), in order then to be available particularly quickly when called up.
- a possible implementation can provide different data transfers between a CPU (0101) and VPÜ (0102).
- the configurations to be executed on the VPU are determined by the instruction decoder (0105) of the
- the VPU can take data from a CPU register (0103), process it and write it back to one or the CPÜ register.
- the VPÜ can receive an RDY signal (DE 196 51 075, DE 110 10 530) by writing the data into a CPU register by the CPU and then processing the written data. Reading out data from a CPU register by the CPU can generate an ACK signal (DE 196 51 075, DE 110 10 530), as a result of which the data transfer by the CPU is signaled to the VPÜ.
- CPCs typically do not provide such mechanisms.
- An easy-to-implement approach is to perform data synchronization using a status register (0104).
- the VPU can read data from a register and the associated ACK signal (DE 196 51 075, DE 110 10 530) and / or write data into a register and the associated RDY signal (DE 196 51 075, DE 110 10 530) in the status register.
- the CPU first tests the status register and, for example, executes waiting loops or task changes until - depending on the operation - the RDY or ACK has arrived. The CPU then executes the respective register data transfer.
- the CPÜ instruction set is expanded to include load / store instructions with an integrated status query (load_rdy, store_ack).
- load_rdy a new data word is only written to a CPU register if the register was previously read by the VPU and an ACK arrived.
- load_rdy only reads data from a CPU register if the VPU has previously written new data and generated an RDY.
- Data belonging to a configuration to be executed can be written to or read from the CPU registers successively, as it were by block moves according to the prior art.
- implemented block-move instructions can preferably be expanded by the integrated RDY / ACK status query described.
- An additional or alternative variant provides that the data processing within the VPU coupled to the CPU requires exactly the same number of cycles as the data processing within the CPU computing pipeline.
- This concept can be ideally used in particular for modern high-performance CPUs with a large number of pipeline stages (> 20).
- the particular advantage is that no special synchronization mechanisms such as. B. RDY / ACK are necessary.
- the compiler only needs to ensure that the VPÜ complies with the required number of clock cycles and, if necessary, the data processing e.g. B. by inserting delay stages such. B. registers and / or the known from DE 110 10 530, Fig. 9/10, known Fall-Through FIFOs.
- the compiler preferably first of all rearranges the data in such a way that there is at least essentially maximum independence between the accesses by the data path of the CPU and the VPU.
- the maximum distance thus defines the maximum runtime difference between the CPÜ data path and the VPU.
- the runtime difference between CPU data path and VPU data path is preferably compensated for by a reordering method, as is known per se from the prior art.
- compiler can insert NOP cycles (i.e. cycles in which the CPU data path does not process any data) and / or hardware wait cycles in the CPU data path be generated until the necessary data has been written into the register by the VPU.
- NOP cycles i.e. cycles in which the CPU data path does not process any data
- the registers can be provided with an additional bit which indicates the presence of valid data.
- the wave reconfiguration already mentioned allows the successive start of a new VPU instruction and the corresponding configuration as soon as the operands of the previous VPU instruction have been removed from the CPU registers.
- the operands for the new command can be written to the CPU registers immediately after the command has started.
- the VPU is successively reconfigured for the new VPU instruction upon completion of the data processing of the previous VPU instruction and the new operands processed.
- data can be exchanged between a VPU and a CPU by means of suitable bus access to shared resources.
- the VPÜ directly from the external bus (0110) and the associated data source (e.g. memory, peripherals ) read or written to the external bus and the associated data sink (eg memory, peripherals).
- This bus can be the same as the external bus of the CPU (0112 & dashed). This can be determined by suitable analyzes as far as possible in advance by the compiler at the compile time of the application and the binary code can be generated accordingly.
- a protocol (Olli) between the cache and bus is preferably implemented, which ensures the correct content of the cache.
- the per se known MESI protocol may be prior ⁇ 'technology for this comparable applies.
- a particularly preferred method is the close coupling of RAM-PAEs to the cache of the CPU. This enables data to be transferred quickly and efficiently between the memory and / or 10 data bus and the VPU. The external data transfer is largely carried out automatically by the cache controller.
- This procedure allows fast and uncomplicated data exchange, especially for task change processes, for real-time applications and multithreading CPUs when changing threads.
- the RAM-PAE transfers data e.g. B. for reading and / or writing external and in particular main memory data directly to and / or from the cache.
- data e.g. B. for reading and / or writing external and in particular main memory data directly to and / or from the cache.
- a separate data bus according to DE 196 54 595 and DE 199 26 538 can preferably be used, via which independently of the data processing within the VPU and in particular also automatically controlled, e.g. by independent address generators, data can be transferred to or from the cache.
- the RAM-PAEs have no internal memory, but are coupled directly to blocks (slices) of the cache.
- the RAM PAEs only the bus controls for the local buses, as well as possible state machines and / or possible address generators, but the memory is located within a cache bank to which the RAM-PAE has direct access.
- Each RAM-PAE has its own slice within the cache and can access the cache or its own slice independently and in particular simultaneously to the other RAM-PAEs and / or the CPÜ. This can be achieved simply by building the cache from several independent banks (slices).
- a cache slice If the content of a cache slice has been changed by the VPU, it can preferably be marked as "dirty", whereupon the cache controller automatically writes it back to the external and / or main memory.
- a write-through strategy can also be implemented or selected for some applications.
- the VPU writes data to the RAM-PAEs directly with each write operation and writes them back into the external and / or main memory. This also eliminates the need to mark data with "dirty" and write it back to the external and / or main memory when there is a task and / or thread change.
- An FPGA (0113) can be coupled to the architecture described, in particular directly to the VPÜ, to enable fine-grained data processing and / or a flexible adaptable interface (0114) (e.g. various serial interfaces (V24, USB, etc.), various parallel interfaces, hard disk interfaces, Ethernet, telecommunication interfaces (a / b, T0, ISDN, DSL, etc.) to other modules and / or the external bus system (0112).
- various serial interfaces V24, USB, etc.
- various parallel interfaces e.g. various serial interfaces (V24, USB, etc.), various parallel interfaces, hard disk interfaces, Ethernet, telecommunication interfaces (a / b, T0, ISDN, DSL, etc.) to other modules and / or the external bus system (0112).
- the FPGA can be operated statically, ie without reconfiguration at runtime and / or dynamically, ie with reconfiguration at runtime.
- FPGA elements can be accommodated within an ALU-PAE.
- an FPGA data path can be coupled in parallel to the ALU or, in a preferred embodiment, the ALU can be connected upstream or downstream.
- Bit-oriented operations usually occur very sporadically within algorithms written in high-level languages such as C and are not particularly complex. Therefore, an FPGA structure of a few rows of logic elements, each coupled to one another by a row of wiring channels, is sufficient. Such a structure can be programmed inexpensively and simply integrable into the ALU. A significant advantage for the programming methods explained below can be that the throughput time is limited by the FPGA structure in such a way that the runtime behavior of the ALU does not change. Registers only need to be allowed to store data for inclusion as operands in the next cycle of processing.
- optionally configurable registers is particularly advantageous in order to produce a sequential behavior of the function, for example by pipelining. This is particularly advantageous if feedback occurs in the code for the FPGA structure.
- the compiler can then map these by switching on such registers by configuration and thus map sequential code correctly.
- the state machine of the PAE which controls its processing, is informed of the number of registers inserted by configuration so that its control, in particular also the PAE-external data transfer, can adapt to the increased latency.
- the described methods initially do not provide a special mechanism for supporting operating systems. It is namely preferable to ensure that an operating system to be executed behaves in accordance with the status of a VPU to be supported. In particular, schedulers are required.
- the status register of the CPU is preferably queried, in which the coupled VPÜ enters its data processing status (termination signal). If further data processing is to be transferred to the VPU and the VPU has not yet ended the previous data processing, a wait is carried out or a task change is preferably carried out.
- sequence control of a VPU can be carried out directly by a program executed on the CPU, which is basically the main program that outsources certain subroutines to the VPU.
- Mechanisms controlled via the operating system and the scheduler are preferably used for a coprocessor coupling, in principle the sequence control of a VPÜ directly from one to the other the CPÜ can be carried out, which is basically the main program that outsources certain subroutines to the VPU:
- a simple scheduler can transfer a function to a VPU
- the task scheduler switches to another task (e.g. another main program).
- the VPU can continue to work in the background regardless of the current CPU task.
- Each newly activated task if it uses the VPÜ, must check before use whether it is available for data processing or is currently still processing data; then either the data processing must be waited for or the task preferably changed.
- each task To call the VPÜ, each task generates an " ⁇ " of several tables (VPUPROC) with a suitable specified data format in the This table contains all control information for a VPU, such as the program / configuration (s) to be executed (or pointers to the corresponding memory locations) and / or memory location (s) (or each pointer to it) and / or data sources (or pointer to it) of the input data and / or the storage location (s) (or pointer to it) of the operands or the result data .
- VPUPROC control information for a VPU, such as the program / configuration (s) to be executed (or pointers to the corresponding memory locations) and / or memory location (s) (or each pointer to it) and / or data sources (or pointer to it) of the input data and / or the storage location (s) (or pointer to it) of the operands or the result data .
- a table or concatenated can be located in the memory area of the operating system List (LINKLIST, 0201) located on all VPUPROC tables (0202) in the order of their first ellung and / or their call shows.
- the data processing on the VPÜ now proceeds in such a way that a main program creates a VPUPROC and calls the VPU via the operating system.
- the operating system creates an entry in the LINKLIST.
- the VPU processes the LINKLIST and executes the referenced VPUPROC.
- the completion of each data processing is indicated by a corresponding entry in the LINKLIST and / or VPUCALL table.
- interrupts from the VPU to the CPU can be used as a display and possibly also for exchanging the VPU status.
- the VPU works largely independently of the CPÜ.
- the CPU and the VPU can perform independent and different tasks per time unit. to lead.
- the operating system and / or the respective task only have to monitor the tables (LINKLIST or VPUPROC).
- the LINKLIST can also be dispensed with by linking the VPÜPROCs to one another using pointers, as is the case e.g. is known from lists. Completed VPÜPROCs are removed from the list, new ones are added to the list. The method is known to programmers and therefore does not have to be carried out further.
- multithreading and / or hyperthreading technologies is particularly advantageous, in which a scheduler - preferably implemented in hardware - distributes fine-grained applications and / or application parts (threads) to resources within the processor.
- the VPU data path is viewed as a resource for the scheduler.
- the implementation of multithreading and / or hyperthreading technologies in the compiler already provides a clear separation of the CPU data path and the VPÜ data path by definition.
- paral- lele utilization of CPU p is bland and VPU data path while loading guenstigt.
- multithreading and / or hyperthreading is a preferred method over the LINKLIST described above.
- the two methods work particularly efficiently when an architecture is used as the VPU that permits reconfiguration overlaid with data processing, such as: B. the wave reconfiguration according to DE 198 07 872, DE 199 26 538, DE 100 28 397.
- FIG. 3 shows a possible internal structure of a microprocessor or microcontroller.
- the core (0301) of a microcontroller or microprocessor is shown.
- the exemplary structure also includes a load / store unit for transferring the data between the core and the external memory and / or the peripheral devices. The transmission takes place via the interface 0303, to which further units such as MMUs, caches, etc. can be coupled.
- the load / store unit transfers the data to or from a register set (0304), which then temporarily stores the data for internal processing.
- the internal further processing takes place in one or more data paths, which can each be configured identically or differently (0305).
- several register sets can also be present, these in turn possibly being coupled to different data paths (eg integer data paths, floating point data paths, DSP data paths / multiply-accumulate units).
- Data paths typically take operands from the register unit and write the results back to the register unit after data processing.
- an instruction loading unit opcode fetcher, 0306
- the commands are fetched via an interface (0307) to a code memory, which if necessary . MMUs, caches, etc. can be interposed.
- the VPU data path is connected in parallel with data path 0305
- VPU data path is for example known from DE 196 51 075 DE 100 50 442, '"DE 102 06 653 and a number of publications de' r Applicant.
- the VPU data path is configured via the configuration manager (CT) 0310, which loads the configurations from an external memory via a 0311 bus.
- CT configuration manager
- the bus 0311 can be identical to 0307, depending on the configuration between 0311 and 0307 and / or the memory, one or more caches can be connected.
- the OpCode fetcher 0306 defines which configuration is to be configured and carried out at a specific point in time using special OpCodes. For this purpose, a number of possible configurations can be assigned to a series of OpCodes reserved for the VPU data path. The assignment can be made using a re-programmable lookup table (see 0106), which is connected upstream of 0310, so that the assignment can be freely programmed and changed within the application.
- the target register of the data calculation can be managed in the data register assignment unit (0309).
- the target register defined by the OpCode is loaded into a memory or register (0314), which - in order to allow several VPU data path calls in succession and without taking the processing time of the respective configuration into account - can be designed as a FIFO.
- a configuration provides the result data, it is linked to the assigned register address (0315) and the corresponding register is selected and written in 0304.
- This means that a large number of VPU data path calls can be made directly one after the other and in particular overlapping. It is only necessary to ensure, for example by means of compilers or hardware, that the operands and result data are rearranged in relation to the data processing in data path 0305 in such a way that no malfunctions due to different runtimes occur in 0305 and 0308.
- DE 100 28 397, DE 102 12 621) can preload.
- data access to register set 0304 can also be controlled via memory 0314.
- VPU data path configuration that has already been configured is called up, no new configuration takes place.
- Data is immediately transferred from register set 0304 to the VPU data path for processing.
- the configuration manager saves the currently loaded configuration identification number in a register and compares it with the configuration identification number to be loaded, which is transferred to 0310, for example, via a lookup table (see 0106). Only if the numbers do not match will the called configuration be reconfigured.
- the load / store unit is only shown schematically and fundamentally in FIG. 3; a preferred embodiment is shown in detail in FIGS. 4 and 5.
- the VPU data path (0308) can transfer data directly with the load / store unit and / or the cache; via another application-dependent data path 0313, data can be transferred directly between the VPU data path (0308) and peripheral devices and / or external devices Memory are transferred.
- FIG. 4 shows a particularly preferred embodiment of the load / store unit.
- An essential data processing principle of the VPU architecture provides for memory cells coupled to the array of ALU-PAEs, which serve as a kind of register set for data blocks. The method is from DE 196 54 846, DE 101 39 170, DE 199 26 538, DE 102 06 653 known. For this purpose, it is advisable, as described below, to process LOAD and STORE commands as a configuration within the VPU, which eliminates the need to interconnect the VPU with the load / store unit (0401) of the CPU. In other words, the VPU generates its read and write accesses itself, which makes a direct connection (0404) to the external and / or main memory useful.
- a cache (0402), which can be the same as the data cache of the processor.
- the load / store unit of the processor (0401) accesses the cache directly and in parallel with the VPU (0403) without - unlike 0302 - having a data path for the VPU.
- FIG. 5 shows particularly preferred connections of the VPU to the external and / or main memory via a cache.
- the simplest connection method is known via an IO connection of the VPU, as for example from DE 196 51 075.9-53, DE 196 54 595.1-53, DE 100 50 442.6, DE 102 06 653.1, via which addresses and data between peripherals and / or Memory and the VPU are transferred.
- direct connections between the RAM-PAEs and the cache are particularly powerful, as is known from DE 196 54 595 and DE 199 26 538.
- a PAE is shown as an example of a reconfigurable data processing element, made up of a main data processing unit (0501), which is typically designed as an ALU, RAM, FPGA, IO connection, and two side data transmission units (0502, 0503), which in turn have an ALÜ Can have structure and / or register structure.
- the horizontal internal bus systems 0504a and 0504b belonging to the PAE are also shown.
- FIG. 5a RAM-PAEs (0501a), each of which contains its own memory according to DE 196 54 595 and DE 199 26 538, are coupled to a cache 0510 via a multiplexer 0511.
- the cache controller and the connection bus of the cache to the main memory are not shown.
- the RAM-PAEs preferably have a separate data bus (0512) with their own address generators (see also DE 102 06 653) in order to be able to transfer data independently into the cache.
- Figure 5b shows an optimized variant.
- 0501b are not fully-fledged RAM-PAEs, but only contain the bus systems and side data transmission units (0502, 0503). Instead of the integrated memory in 0501, only a bus connection (0521) to cache 0520 is implemented.
- the cache is divided into several segments 05201, 05202 ... 0520n, which are each assigned to a 0501b and are preferably reserved exclusively for this 0501b.
- the cache thus represents the amount of all RAM-PAEs of the VPÜ and the data cache (0522) of the CPÜ.
- the VPÜ writes its internal (register) data directly into the cache or reads it directly from the cache. Changed data can be marked with "dirty", whereupon the cache controller (not shown) automatically updates it in the main memory. Alternatively, write-through methods are available, in which changed data is stored directly in the main memory be written and the administration of the "dirties" becomes superfluous.
- FIG. 5b shows the direct coupling of an FPGA structure into a data path using the example of the VPU architecture.
- 0501 is the main data path of a PAE.
- FPGA structures are preferably inserted directly after the input registers (cf. PACT02, PACT22) (0611) and / or directly before the output of the data path onto the bus system (0612).
- a possible FPGA structure is shown in 0610, the structure is based on PACT13 Figure 35.
- the FPGA structure is coupled into the ALU via a data input (0605) and a data output (0606).
- a) logic elements are arranged in one line (0601), which perform bitwise logical (AND, OR, NOT, XOR, etc.) operations on incoming data.
- This logic elements may additionally comprise local bus, as can register for spei- assurance in the logic elements' be provided.
- a vertical network (0604) can be provided for signal transmission, which is also constructed in accordance with the known FPGA networks. Using this network, signals can be transmitted past several rows of elements 0601 and 0602. Since elements 0601 and 0602 typically already have a number of vertical bypass signal networks, 0604 is only optional and is required for a large number of lines.
- a register 0607 is implemented in, in order to match the state machine of the PAE to the respectively configured depth of the pipeline in 0610, ie the number (NRL) of the configured register stages (0602) between the input (0605) and the output (0606) which NRL is configured. Based on this data the state machine coordinates the generation of the PAE internal control cycles and in particular also the handshake signals (PACT02, PACT16, PACT18) for the PAE external bus systems. Further possible FPGA structures are known, for example, from Xilinx and Altera, these preferably having a register structure after 0610.
- FIG. 7 shows several strategies for achieving code compatibility between VPUs of different sizes:
- 0701 is an ALU-PAE (0702) RAM-PAE (0703) arrangement which defines a possible "small” VPÜ. In the following it should be assumed that code has been generated for this structure and is now to be processed on other larger VPUs.
- a first possible approach is to recompile the code for the new target VPU.
- this offers the advantage that functions that may no longer exist in a new target VPU are simulated by instantiating the compiler macros for these functions, which then emulate the original function.
- the simulation can be done either by using multiple PAEs and / or described by the use of sequencers as described below (for example, for division, floating point, complex mathematics, etc) and, for example, 'from PACT02 known.
- sequencers as described below (for example, for division, floating point, complex mathematics, etc) and, for example, 'from PACT02 known.
- the clear disadvantage of the method is that the binary compatibility is lost.
- a first simple method involves the insertion of "wrapper" code (0704), which extends the bus systems between a small ALU-PAE array and the RAM-PAEs.
- the code only contains the configuration for the bus systems and is inserted into the existing binary code, for example at configuration time and / or at loading time from a memory.
- FIG. 7a b) shows a simple, optimized variant in which the lengthening of the bus systems is compensated for and is therefore less frequency-critical, since the running time for the wrapper bus system is halved compared to FIG. 7a a).
- the method according to FIG. 7b can be used for higher frequencies, in which a larger VPU represents a superset of the compatible small VPU (0701) and the complete structures of 0701 are replicated. Direct binary compatibility is thus simply given.
- an optimal method provides for additional high-speed bus systems which have a connection (0705) to each PAE or to a group of PAEs.
- bus Systems are known from the applicant's other patent applications, for example from PACT07.
- the connections 0705 the data is transferred to a high-speed bus system (0706), which then transmits it over a large distance in a performance-efficient manner.
- Ethernet, RapidIO, USB, AMBA, RAMBUS and other industry standards can be used as such high-speed bus systems.
- connection to the high-speed bus system can either be inserted using a wrapper as described for FIG. 7a, or it may already be provided for 0701 in terms of architecture. In this case, at 0701, the connection is simply forwarded directly to the neighboring cell and is not used.
- the hardware abstracts the absence of the bus system.
- Prior art parallelizing compilers typically use special constructs such as semaphores and / or other methods of synchronization.
- Technology-specific processes are typically used.
- known methods are not suitable for combining functionally specified architectures with the associated time behavior and imperatively specified algorithms. Therefore, the methods used only provide satisfactory solutions in special cases.
- Compilers for reconfigurable architectures usually use macros that have been created specifically for the specific reconfigurable hardware, the macros being used for mostly hardware description languages (such as Verilog, VHDL, System-C) become. These macros are then called (instantiated) from a normal high-level language (e.g. C, C ++) from the program flow.
- a normal high-level language e.g. C, C ++
- Compilers for parallel computers which map program parts onto several processors on a coarse-grained structure, usually based on complete functions or threads.
- vectorizing compilers are known, which are largely linear data processing, such as. B. Convert calculations of large expressions into a vectorized form and thus the calculation to su- enable perscalar processors and vector processors (e.g. Pentium, Cray).
- This patent therefore further describes a method for the automatic mapping of functionally or imperatively formulated computation rules to different target technologies, in particular to ASICs, reconfigurable components (FPGAs, DPGAs, VPUs, ChessArray, KressArray, Chameleon, etc .; hereinafter under the Termed VPU), sequential processors (CISC- / RISC-CPÜs, DSPs, etc .; hereinafter summarized under the term CPU) and parallel computer systems (SMP, MMP, etc.).
- VPUs basically consist of a multidimensional homogeneous or inhomogeneous, flat or hierarchical arrangement (PA) of cells (PAEs) that perform any functions, i. b. can perform logical and / or arithmetic functions (ALÜ-PAEs) and / or memory functions (RAM-PAEs) and / or network functions.
- a loading unit (CT) is assigned to the PAEs, which determines the function of the PAEs through configuration and, if necessary, reconfiguration.
- the method is based on an abstract parallel machine model which, in addition to the finite automaton, also integrates imperative problem specifications and enables an efficient algorithmic derivation of an implementation on different technologies.
- the invention is a further development of the compiler technology according to DE 101 39 170.6, which describes in particular the close XPP connection to a processor within its data paths and discloses a compiler which is particularly suitable for this purpose and which also uses XPP standalone systems without close processor coupling.
- Vectorizing compilers build largely linear code that is tailored to special vector computers or heavily pipelined processors. These compilers were originally available for vector computers such as CRAY. Due to the long pipeline structure, modern processors like Pentium require similar processes. Since the individual calculation steps are vectorized (pipelined), the code is much more efficient. However, the conditional
- Jump problems for the pipeline Therefore, a jump prediction makes sense that assumes a jump target. If the assumption is wrong, the entire processing pipeline must be deleted. In other words, every jump is problematic for these compilers, parallel processing in the actual sense is not given. Jump predictions and similar mechanisms require a considerable amount of additional hardware.
- Coarse-grained parallel compilers hardly exist in the actual sense, the parallelism is typically marked and managed by the programmer or the operating system, for example in MMP computer systems such as various IBM architectures, ASCII Red, etc., mostly carried out at thread level. A thread is a largely independent program block or even another program. Coarsely granular threads are therefore easy to parallelize. Synchronization and data consistency must be ensured by the programmer or the operating system.
- Reconfigurable processors have a large number of independent computing units. These 'are not connected to each other through a common register set, but by buses. On the one hand, this makes it easy to set up vector arithmetic units, and on the other hand, simple parallel operations can also be performed. Contrary to conventional register concepts, data dependencies are resolved by the bus connections.
- VLIW vectorizing compilers and parallelizing
- a major advantage is that the compiler does not have to map to a predefined hardware structure, but rather the hardware structure is configured in such a way that it is optimally suited for mapping the respective compiled algorithm.
- Modern processors usually have a set of user-definable instructions (UDI) that are available for hardware expansions and / or special coprocessors and accelerators. If UDIs are not available, processors have at least free, as yet unused commands and / or special commands. le for coprocessors - for the sake of simplicity, all these commands are summarized below under the term UDI.
- UDI user-definable instructions
- a number of these UDIs can now be used to drive a VPU coupled into the processor as a data path.
- loading and / or deleting and / or starting configurations can be triggered by UDIs, specifically a specific UDI can refer to a constant and / or changing configuration.
- Configurations are preferably preloaded into a configuration cache, which is assigned locally to the VPU, and / or preloaded into configuration stacks according to DE 196 51 075.9-53, DE 197 04 728.9 and DE 102 12 621.6-53, from which they occur at runtime when they occur a UDI that starts a configuration can be quickly configured and executed.
- the configuration can be preloaded in a configuration manager shared by several PAEs or PAs and / or in a local configuration memory on and / or in a PAE, in which case only the activation then has to be initiated.
- a set of configurations is preferably preloaded.
- each configuration preferably corresponds to a charging UDI.
- the load UDIs reference to 'depending on a Konfigurati-.
- Moegli 'ch to take a load UDI on a complex configuration arrangement reference, in which about a very wide range of functions that require multiple Umlandern the array during execution, one - even repeated - Wave reconfiguration, etc. by a single UDI can be referenced.
- a specific loading UDI can thus reference a first configuration at a first point in time and reference a meanwhile newly loaded second configuration at a second point in time. This can be done, for example, by changing an entry in a reference list that is to be accessed according to ÜDI.
- ÜDI ÜDI
- LOAD / STORE machine model is used, as is known for example from RISC processors. Every configuration is understood as a command.
- the LOAD and STORE configurations are separate from the data processing configurations.
- a data processing sequence accordingly takes place e.g. B. instead of:
- LOAD configuration Load the data from e.g. B. an external memory, a ROM of a SOC, in which the overall arrangement is integrated, and / or the peripherals in the internal memory bank (RAM-PAE, see DE 196 54 846.2-53, DE 100 50 442.6).
- the configuration includes the necessary if necessary, address generators and / or access controls in order to read data from processor-external memories and / or peripherals and to write them into the RAM-PAEs.
- the RAM-PAEs can be understood as multidimensional data registers (e.g. vector registers).
- the data processing configurations are configured sequentially one after the other in the PA. According to a LOAD / STORE (RISC) processor, the data processing preferably takes place exclusively between the RAM-PAEs - which are used as multidimensional data registers. n. STORE configuration
- RAM-PAEs internal memory banks
- the configuration includes address generators and / or access controls in order to write data from the RAM-PAEs to the process-external memories and / or peripherals.
- address generators and / or access controls in order to write data from the RAM-PAEs to the process-external memories and / or peripherals.
- the address generation functions of the LOAD / STORE configurations are optimized in such a way that, for example in the case of a non-linear access sequence of the algorithm to external data, the corresponding address patterns are generated by the configurations.
- the compiler analyzes the algorithms and creates the address generators for LOAD / STORE. This working principle can easily be illustrated by processing loops. For example, a VPU with 256 entries deep RAM PAEs should be assumed:
- each configuration is considered atomic - that is, not interruptible. This solves the problem that the internal data of the PA and the internal status must be saved in the event of an interruption. During the execution of a configuration, the respective status is written to the RAM-PAEs together with the data.
- the disadvantage of the method is that initially no statement can be made about the runtime behavior of a configuration.
- the run time limitation is not a major disadvantage, since an upper limit is typically already determined by the size of the RAM-PAEs and the associated amount of data.
- the size of the RAM-PAEs expediently corresponds to the maximum number of data processing cycles of a configuration, whereby a typical configuration is limited to a few 100 to 1000 cycles.
- This restriction means that multithreading / hyperthreading and real-time processes can be implemented together with a VPU.
- the running time of configurations is preferably via a tracking counter or watchdog (running with the clock or another signal), e.g. B. monitors a counter.
- a tracking counter or watchdog running with the clock or another signal
- the watchdog triggers an interrupt and / or trap, which can be understood and handled by processors in a similar way to an "illegal opcode" trap.
- a restriction can alternatively be introduced to reduce reconfiguration processes and to increase performance:
- Running configurations can retrigger the watchdog and thus run longer without having to be changed.
- a retrigger is only permitted if the algorithm has reached a "safe" state (synchronization time) in which all data and states are written to the RAM-PAEs and an interruption is algorithmically permitted.
- the disadvantage of this extension is that a configuration as part of its data processing in could run a deadlock, but still properly retriggered the watchdog and thus did not terminate the configuration.
- a blockage of the VPU resource by such a zombie configuration can be prevented in that the retriggering of the watchdog can be prevented by a task change and thus the configuration is changed at the next synchronization time or after a predetermined number of synchronization times. As a result, the task displaying the zombie no longer terminates, but the overall system continues to run properly.
- Multi-threading and / or hyperthreading for the machine model or the processor can optionally be introduced as a further method. All VPÜ routines, ie their configurations, are then preferably considered as a separate thread. Since the VPU is coupled into the processor as an arithmetic unit, it can be regarded as a resource for the threads.
- the scheduler implemented according to the state of the art for multithreading (see also P 42 21 278.2-09) automatically distributes threads (VPU threads) programmed for VPUs among them. In other words, the scheduler automatically distributes the different tasks within the processor. This creates a more 'level of parallelism. Both pure processor threads and VPU threads are processed in parallel and can be managed automatically by the scheduler without any special measures.
- the method is particularly efficient if the compiler, as preferred and regularly possible, breaks down programs into a plurality of threads that can be processed in parallel and thereby divides all the VPU program sections into individual VPU threads.
- the compiler breaks down programs into a plurality of threads that can be processed in parallel and thereby divides all the VPU program sections into individual VPU threads.
- several VPU data paths which are each considered as an independent resource, can be implemented. At the same time, this also increases the degree of parallelism, since several VPU data paths can be used in parallel.
- VPU resources can be reserved for interrupt routines, so that a response to an incoming interrupt does not have to wait until the atomic, non-interruptible configurations have been terminated.
- VPU resources can be blocked for interrupt routines, ie no interrupt routine can use a VPU resource and / or contain a corresponding thread. This also gives fast interrupt response times. Since no or only a few VPU-performing algorithms typically occur within interrupt routines, this method is preferred. If the interrupt leads to a task change, the VPü resource can be terminated in the meantime; in the Sufficient time is usually available for the task change.
- a problem that arises when changing tasks can be that the previously described LOAD-PROCESS-STORE cycle has to be interrupted without all data and / or status information from the RAM-PAEs having been written into the external RAMs and / or peripheral devices.
- a configuration PUSH is now introduced, which, e.g. B. during a task change, between the configurations of the LOAD-PROCESS-STORE cycle.
- PUSH backs up the internal memory contents of the RAM-PAEs externally, e.g. B. on a stack; extern here means z. B. external to the PA or a PA part, but can also refer to peripherals, etc.
- PUSH corresponds in its basis to the process of classic processors.
- the task can be changed, ie the current LOAD-PROCESS-STORE cycle can be canceled and a LOAD-PROCESS-STORE cycle of the next task can be executed.
- the interrupted LOAD-PROCESS-STORE cycle is restarted when the task changes to the corresponding task on the configuration (KATS) that follows after the last configuration performed.
- KATS configuration
- the methods in known processors loads corresponding to the data for the RAM-PAEs from the external memories. For example the stack.
- the direct access of the RAM-PAEs to a cache or the direct implementation of the RAM-PAEs in a cache means that the memory contents can be exchanged quickly and easily when a task is changed.
- Case A The RAM-PAE contents are written into the cache via a preferably separate and independent bus and reloaded from it.
- the cache is managed by a cache controller according to the state of the art. Only the RAM PAEs that have been changed compared to the original content have to be written to the cache. For this purpose, a "dirty" flag can be introduced for the RAM-PAEs, which indicates whether a RAM-PAE has been written to and changed. It should be mentioned that appropriate hardware means for implementation can be provided for this.
- Case B The RAM-PAEs are located directly in the cache and are marked there as special storage locations that are not influenced by the normal data transfers between processor and memory. at other cache sections are referenced when the task is changed. Modified RAM-PAEs can be marked with dirty.
- the cache controller is managed by the cache controller.
- the LOAD PROCESS STORE cycle allows a particularly efficient debugging method of the program code according to DE 101 42 904.5. If, as is preferred, each configuration is considered to be atomic and therefore uninterruptible, the data and / or states relevant for debugging are basically in the RAM-PAEs after the processing of a configuration has ended. The debugger therefore only has to access the RAM-PAEs in order to receive all essential data and / or states.
- a mixed mode debugger is used according to DE 101 42 904.5, in which the RAM-PAE contents are read before and after a configuration and the configuration itself by means of a simulator that the execution of the configuration is simulated and checked. If the simulation results do not match the memory contents of the RAM-PAEs after the configuration processed on the VPU has expired, the simulator is not consistent with the hardware and there is either a hardware or simulator error, which is then the result of the hardware manufacturer or the Simulation software must be checked.
- the PAEs can have sequencers according to DE 196 51 075.9-53 (FIGS. 17, 18, 21) and / or DE 199 26 538.0, for example entries in the configuration stack (cf. DE 197 04 728.9, DE 100 28 397.7, DE 102 12 621.6-53) can be used as code memory for a sequencer.
- sequencers are usually very difficult to control and use by compilers. For this reason, pseudo codes for which compiler-generated assembly instructions are mapped are preferably provided for these sequencers. For example, it is inefficient to provide hardware opcodes for division, root, powers, geometric operations, complex mathematics, floating point commands, etc. Such instructions are therefore implemented as multi-cyclic sequencer routines, the compiler instantiating such macros by the assembler if necessary.
- the compiler If logical operations occur within the program to be translated by the compiler, e.g. &,
- registers are configured after the function in the FPGA unit, which cause a delay by one clock and thus synchronization.
- the number of register stages inserted is written into a delay register via FPGA unit in the configuration of the generated configuration on the VPÜ, which controls the state machine of the PAE.
- the state machine can adapt the management of the handshake protocols to the additional pipeline level that occurs.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Advance Control (AREA)
- Hardware Redundancy (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03720231A EP1518186A2 (en) | 2002-03-21 | 2003-03-21 | Method and device for data processing |
Applications Claiming Priority (56)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE10212622 | 2002-03-21 | ||
DE10212622A DE10212622A1 (en) | 2002-03-21 | 2002-03-21 | Computer program translation method allows classic language to be converted for system with re-configurable architecture |
DE10212621 | 2002-03-21 | ||
DE10212621 | 2002-03-21 | ||
EP02009868 | 2002-05-02 | ||
DE10219681 | 2002-05-02 | ||
EP02009868 | 2002-05-02 | ||
DE10219681 | 2002-05-02 | ||
DE10226186A DE10226186A1 (en) | 2002-02-15 | 2002-06-12 | Data processing unit has logic cell clock specifying arrangement that is designed to specify a first clock for at least a first cell and a further clock for at least a further cell depending on the state |
DE10226186 | 2002-06-12 | ||
WOPCT/EP02/06865 | 2002-06-20 | ||
DE10227650A DE10227650A1 (en) | 2001-06-20 | 2002-06-20 | Reconfigurable elements |
DE10227650 | 2002-06-20 | ||
PCT/EP2002/006865 WO2002103532A2 (en) | 2001-06-20 | 2002-06-20 | Data processing method |
DE10236271 | 2002-08-07 | ||
DE10236272 | 2002-08-07 | ||
DE10236269 | 2002-08-07 | ||
DE10236272 | 2002-08-07 | ||
DE10236269 | 2002-08-07 | ||
DE10236271 | 2002-08-07 | ||
PCT/EP2002/010065 WO2003017095A2 (en) | 2001-08-16 | 2002-08-16 | Method for the translation of programs for reconfigurable architectures |
WOPCT/EP02/10065 | 2002-08-16 | ||
DE10238174A DE10238174A1 (en) | 2002-08-07 | 2002-08-21 | Router for use in networked data processing has a configuration method for use with reconfigurable multi-dimensional fields that includes specifications for handling back-couplings |
DE10238172 | 2002-08-21 | ||
DE10238173A DE10238173A1 (en) | 2002-08-07 | 2002-08-21 | Cell element field for processing data has function cells for carrying out algebraic/logical functions and memory cells for receiving, storing and distributing data. |
DE10238174 | 2002-08-21 | ||
DE10238172A DE10238172A1 (en) | 2002-08-07 | 2002-08-21 | Cell element field for processing data has function cells for carrying out algebraic/logical functions and memory cells for receiving, storing and distributing data. |
DE10238173 | 2002-08-21 | ||
DE10240000A DE10240000A1 (en) | 2002-08-27 | 2002-08-27 | Router for use in networked data processing has a configuration method for use with reconfigurable multi-dimensional fields that includes specifications for handling back-couplings |
DE10240022 | 2002-08-27 | ||
DE10240022 | 2002-08-27 | ||
DE10240000 | 2002-08-27 | ||
WOPCT/DE02/03278 | 2002-09-03 | ||
PCT/DE2002/003278 WO2003023616A2 (en) | 2001-09-03 | 2002-09-03 | Method for debugging reconfigurable architectures |
DE10241812A DE10241812A1 (en) | 2002-09-06 | 2002-09-06 | Cell element field for processing data has function cells for carrying out algebraic/logical functions and memory cells for receiving, storing and distributing data. |
DE10241812 | 2002-09-06 | ||
WOPCT/EP02/10479 | 2002-09-18 | ||
WOPCT/EP02/10464 | 2002-09-18 | ||
PCT/EP2002/010479 WO2003025781A2 (en) | 2001-09-19 | 2002-09-18 | Router |
EP0210464 | 2002-09-18 | ||
EP02010572 | 2002-09-19 | ||
PCT/EP2002/010572 WO2003036507A2 (en) | 2001-09-19 | 2002-09-19 | Reconfigurable elements |
EP02022692 | 2002-10-10 | ||
EP02022692 | 2002-10-10 | ||
EP02027277 | 2002-12-06 | ||
EP02027277 | 2002-12-06 | ||
DE10300380 | 2003-01-07 | ||
DE10300380 | 2003-01-07 | ||
PCT/DE2003/000152 WO2003060747A2 (en) | 2002-01-19 | 2003-01-20 | Reconfigurable processor |
WOPCT/DE03/00152 | 2003-01-20 | ||
PCT/EP2003/000624 WO2003071418A2 (en) | 2002-01-18 | 2003-01-20 | Method and device for partitioning large computer programs |
EP03000624 | 2003-01-20 | ||
WOPCT/DE03/00489 | 2003-02-18 | ||
PCT/DE2003/000489 WO2003071432A2 (en) | 2002-02-18 | 2003-02-18 | Bus systems and method for reconfiguration |
PCT/DE2003/000942 WO2003081454A2 (en) | 2002-03-21 | 2003-03-21 | Method and device for data processing |
EP03720231A EP1518186A2 (en) | 2002-03-21 | 2003-03-21 | Method and device for data processing |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1518186A2 true EP1518186A2 (en) | 2005-03-30 |
Family
ID=56290401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP03720231A Ceased EP1518186A2 (en) | 2002-03-21 | 2003-03-21 | Method and device for data processing |
Country Status (4)
Country | Link |
---|---|
US (3) | US20060075211A1 (en) |
EP (1) | EP1518186A2 (en) |
AU (1) | AU2003223892A1 (en) |
WO (1) | WO2003081454A2 (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AT501479B8 (en) * | 2003-12-17 | 2007-02-15 | On Demand Informationstechnolo | DIGITAL COMPUTER DEVICE |
US7669035B2 (en) * | 2004-01-21 | 2010-02-23 | The Charles Stark Draper Laboratory, Inc. | Systems and methods for reconfigurable computing |
US8966223B2 (en) * | 2005-05-05 | 2015-02-24 | Icera, Inc. | Apparatus and method for configurable processing |
US9081901B2 (en) * | 2007-10-31 | 2015-07-14 | Raytheon Company | Means of control for reconfigurable computers |
JP5373620B2 (en) * | 2007-11-09 | 2013-12-18 | パナソニック株式会社 | Data transfer control device, data transfer device, data transfer control method, and semiconductor integrated circuit using reconfiguration circuit |
US9003165B2 (en) * | 2008-12-09 | 2015-04-07 | Shlomo Selim Rakib | Address generation unit using end point patterns to scan multi-dimensional data structures |
EP2831693B1 (en) | 2012-03-30 | 2018-06-13 | Intel Corporation | Apparatus and method for accelerating operations in a processor which uses shared virtual memory |
US9471433B2 (en) | 2014-03-19 | 2016-10-18 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Optimizing computer hardware usage in a computing system that includes a plurality of populated central processing unit (‘CPU’) sockets |
US9471329B2 (en) | 2014-03-19 | 2016-10-18 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Optimizing computer hardware usage in a computing system that includes a plurality of populated central processing unit (‘CPU’) sockets |
JP2016178229A (en) | 2015-03-20 | 2016-10-06 | 株式会社東芝 | Reconfigurable circuit |
GB2536658B (en) * | 2015-03-24 | 2017-03-22 | Imagination Tech Ltd | Controlling data flow between processors in a processing system |
US10353709B2 (en) * | 2017-09-13 | 2019-07-16 | Nextera Video, Inc. | Digital signal processing array using integrated processing elements |
US10426424B2 (en) | 2017-11-21 | 2019-10-01 | General Electric Company | System and method for generating and performing imaging protocol simulations |
FR3086409A1 (en) * | 2018-09-26 | 2020-03-27 | Stmicroelectronics (Grenoble 2) Sas | METHOD FOR MANAGING THE PROVISION OF INFORMATION, PARTICULARLY INSTRUCTIONS, TO A MICROPROCESSOR AND CORRESPONDING SYSTEM |
US11803507B2 (en) | 2018-10-29 | 2023-10-31 | Secturion Systems, Inc. | Data stream protocol field decoding by a systolic array |
CN111124514B (en) * | 2019-12-19 | 2023-03-28 | 杭州迪普科技股份有限公司 | Method and system for realizing loose coupling of frame type equipment service plates and frame type equipment |
CN117435259B (en) * | 2023-12-20 | 2024-03-22 | 芯瞳半导体技术(山东)有限公司 | VPU configuration method and device, electronic equipment and computer readable storage medium |
Family Cites Families (143)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2067477A (en) * | 1931-03-20 | 1937-01-12 | Allis Chalmers Mfg Co | Gearing |
GB971191A (en) * | 1962-05-28 | 1964-09-30 | Wolf Electric Tools Ltd | Improvements relating to electrically driven equipment |
US3564506A (en) * | 1968-01-17 | 1971-02-16 | Ibm | Instruction retry byte counter |
US5459846A (en) * | 1988-12-02 | 1995-10-17 | Hyatt; Gilbert P. | Computer architecture system having an imporved memory |
US4498134A (en) * | 1982-01-26 | 1985-02-05 | Hughes Aircraft Company | Segregator functional plane for use in a modular array processor |
US4498172A (en) * | 1982-07-26 | 1985-02-05 | General Electric Company | System for polynomial division self-testing of digital networks |
US4566102A (en) * | 1983-04-18 | 1986-01-21 | International Business Machines Corporation | Parallel-shift error reconfiguration |
US4571736A (en) * | 1983-10-31 | 1986-02-18 | University Of Southwestern Louisiana | Digital communication system employing differential coding and sample robbing |
US4646300A (en) * | 1983-11-14 | 1987-02-24 | Tandem Computers Incorporated | Communications method |
US4720778A (en) * | 1985-01-31 | 1988-01-19 | Hewlett Packard Company | Software debugging analyzer |
US5225719A (en) * | 1985-03-29 | 1993-07-06 | Advanced Micro Devices, Inc. | Family of multiple segmented programmable logic blocks interconnected by a high speed centralized switch matrix |
US4720780A (en) * | 1985-09-17 | 1988-01-19 | The Johns Hopkins University | Memory-linked wavefront array processor |
US4910665A (en) * | 1986-09-02 | 1990-03-20 | General Electric Company | Distributed processing system including reconfigurable elements |
US5367208A (en) * | 1986-09-19 | 1994-11-22 | Actel Corporation | Reconfigurable programmable interconnect architecture |
GB2211638A (en) * | 1987-10-27 | 1989-07-05 | Ibm | Simd array processor |
FR2606184B1 (en) * | 1986-10-31 | 1991-11-29 | Thomson Csf | RECONFIGURABLE CALCULATION DEVICE |
US4811214A (en) * | 1986-11-14 | 1989-03-07 | Princeton University | Multinode reconfigurable pipeline computer |
US5081575A (en) * | 1987-11-06 | 1992-01-14 | Oryx Corporation | Highly parallel computer architecture employing crossbar switch with selectable pipeline delay |
US5055999A (en) * | 1987-12-22 | 1991-10-08 | Kendall Square Research Corporation | Multiprocessor digital data processing system |
US5287511A (en) * | 1988-07-11 | 1994-02-15 | Star Semiconductor Corporation | Architectures and methods for dividing processing tasks into tasks for a programmable real time signal processor and tasks for a decision making microprocessor interfacing therewith |
US4901268A (en) * | 1988-08-19 | 1990-02-13 | General Electric Company | Multiple function data processor |
US5081375A (en) * | 1989-01-19 | 1992-01-14 | National Semiconductor Corp. | Method for operating a multiple page programmable logic device |
GB8906145D0 (en) * | 1989-03-17 | 1989-05-04 | Algotronix Ltd | Configurable cellular array |
US5203005A (en) * | 1989-05-02 | 1993-04-13 | Horst Robert W | Cell structure for linear array wafer scale integration architecture with capability to open boundary i/o bus without neighbor acknowledgement |
CA2021192A1 (en) * | 1989-07-28 | 1991-01-29 | Malcolm A. Mumme | Simplified synchronous mesh processor |
US5489857A (en) * | 1992-08-03 | 1996-02-06 | Advanced Micro Devices, Inc. | Flexible synchronous/asynchronous cell structure for a high density programmable logic device |
GB8925723D0 (en) * | 1989-11-14 | 1990-01-04 | Amt Holdings | Processor array system |
US5099447A (en) * | 1990-01-22 | 1992-03-24 | Alliant Computer Systems Corporation | Blocked matrix multiplication for computers with hierarchical memory |
US5483620A (en) * | 1990-05-22 | 1996-01-09 | International Business Machines Corp. | Learning machine synapse processor system apparatus |
US5193202A (en) * | 1990-05-29 | 1993-03-09 | Wavetracer, Inc. | Processor array with relocated operand physical address generator capable of data transfer to distant physical processor for each virtual processor while simulating dimensionally larger array processor |
US5752067A (en) * | 1990-11-13 | 1998-05-12 | International Business Machines Corporation | Fully scalable parallel processing system having asynchronous SIMD processing |
US5734921A (en) * | 1990-11-13 | 1998-03-31 | International Business Machines Corporation | Advanced parallel array processor computer package |
US5590345A (en) * | 1990-11-13 | 1996-12-31 | International Business Machines Corporation | Advanced parallel array processor(APAP) |
CA2051029C (en) * | 1990-11-30 | 1996-11-05 | Pradeep S. Sindhu | Arbitration of packet switched busses, including busses for shared memory multiprocessors |
US5276836A (en) * | 1991-01-10 | 1994-01-04 | Hitachi, Ltd. | Data processing device with common memory connecting mechanism |
JPH04328657A (en) * | 1991-04-30 | 1992-11-17 | Toshiba Corp | Cache memory |
US5260610A (en) * | 1991-09-03 | 1993-11-09 | Altera Corporation | Programmable logic element interconnections for programmable logic array integrated circuits |
FR2681791B1 (en) * | 1991-09-27 | 1994-05-06 | Salomon Sa | VIBRATION DAMPING DEVICE FOR A GOLF CLUB. |
JP2791243B2 (en) * | 1992-03-13 | 1998-08-27 | 株式会社東芝 | Hierarchical synchronization system and large scale integrated circuit using the same |
US5493663A (en) * | 1992-04-22 | 1996-02-20 | International Business Machines Corporation | Method and apparatus for predetermining pages for swapping from physical memory in accordance with the number of accesses |
US5611049A (en) * | 1992-06-03 | 1997-03-11 | Pitts; William M. | System for accessing distributed data cache channel at each network node to pass requests and data |
US5386154A (en) * | 1992-07-23 | 1995-01-31 | Xilinx, Inc. | Compact logic cell for field programmable gate array chip |
US5581778A (en) * | 1992-08-05 | 1996-12-03 | David Sarnoff Researach Center | Advanced massively parallel computer using a field of the instruction to selectively enable the profiling counter to increase its value in response to the system clock |
WO1994003901A1 (en) * | 1992-08-10 | 1994-02-17 | Monolithic System Technology, Inc. | Fault-tolerant, high-speed bus system and bus interface for wafer-scale integration |
US5497498A (en) * | 1992-11-05 | 1996-03-05 | Giga Operations Corporation | Video processing module using a second programmable logic device which reconfigures a first programmable logic device for data transformation |
US5857109A (en) * | 1992-11-05 | 1999-01-05 | Giga Operations Corporation | Programmable logic device for real time video processing |
US5392437A (en) * | 1992-11-06 | 1995-02-21 | Intel Corporation | Method and apparatus for independently stopping and restarting functional units |
US5386518A (en) * | 1993-02-12 | 1995-01-31 | Hughes Aircraft Company | Reconfigurable computer interface and method |
US5596742A (en) * | 1993-04-02 | 1997-01-21 | Massachusetts Institute Of Technology | Virtual interconnections for reconfigurable logic systems |
WO1994025917A1 (en) * | 1993-04-26 | 1994-11-10 | Comdisco Systems, Inc. | Method for scheduling synchronous data flow graphs |
US5896551A (en) * | 1994-04-15 | 1999-04-20 | Micron Technology, Inc. | Initializing and reprogramming circuitry for state independent memory array burst operations control |
US5600845A (en) * | 1994-07-27 | 1997-02-04 | Metalithic Systems Incorporated | Integrated circuit computing device comprising a dynamically configurable gate array having a microprocessor and reconfigurable instruction execution means and method therefor |
US5603005A (en) * | 1994-12-27 | 1997-02-11 | Unisys Corporation | Cache coherency scheme for XBAR storage structure with delayed invalidates until associated write request is executed |
US5493239A (en) * | 1995-01-31 | 1996-02-20 | Motorola, Inc. | Circuit and method of configuring a field programmable gate array |
EP0727750B1 (en) * | 1995-02-17 | 2004-05-12 | Kabushiki Kaisha Toshiba | Continuous data server apparatus and data transfer scheme enabling multiple simultaneous data accesses |
JP3313007B2 (en) * | 1995-04-14 | 2002-08-12 | 三菱電機株式会社 | Microcomputer |
US5933642A (en) * | 1995-04-17 | 1999-08-03 | Ricoh Corporation | Compiling system and method for reconfigurable computing |
EP0823091A1 (en) * | 1995-04-28 | 1998-02-11 | Xilinx, Inc. | Microprocessor with distributed registers accessible by programmable logic device |
GB9508931D0 (en) * | 1995-05-02 | 1995-06-21 | Xilinx Inc | Programmable switch for FPGA input/output signals |
US5600597A (en) * | 1995-05-02 | 1997-02-04 | Xilinx, Inc. | Register protection structure for FPGA |
JPH08328941A (en) * | 1995-05-31 | 1996-12-13 | Nec Corp | Memory access control circuit |
JP3677315B2 (en) * | 1995-06-01 | 2005-07-27 | シャープ株式会社 | Data-driven information processing device |
US5889982A (en) * | 1995-07-01 | 1999-03-30 | Intel Corporation | Method and apparatus for generating event handler vectors based on both operating mode and event type |
US5784313A (en) * | 1995-08-18 | 1998-07-21 | Xilinx, Inc. | Programmable logic device including configuration data or user data memory slices |
US5943242A (en) * | 1995-11-17 | 1999-08-24 | Pact Gmbh | Dynamically reconfigurable data processing system |
US5732209A (en) * | 1995-11-29 | 1998-03-24 | Exponential Technology, Inc. | Self-testing multi-processor die with internal compare points |
US7266725B2 (en) * | 2001-09-03 | 2007-09-04 | Pact Xpp Technologies Ag | Method for debugging reconfigurable architectures |
KR0165515B1 (en) * | 1996-02-17 | 1999-01-15 | 김광호 | Fifo method and apparatus of graphic data |
US6020758A (en) * | 1996-03-11 | 2000-02-01 | Altera Corporation | Partially reconfigurable programmable logic device |
US6173434B1 (en) * | 1996-04-22 | 2001-01-09 | Brigham Young University | Dynamically-configurable digital processor using method for relocating logic array modules |
US5894565A (en) * | 1996-05-20 | 1999-04-13 | Atmel Corporation | Field programmable gate array with distributed RAM and increased cell utilization |
JP2000513523A (en) * | 1996-06-21 | 2000-10-10 | オーガニック システムズ インコーポレイテッド | Dynamically reconfigurable hardware system for immediate process control |
US6023742A (en) * | 1996-07-18 | 2000-02-08 | University Of Washington | Reconfigurable computing architecture for providing pipelined data paths |
US6023564A (en) * | 1996-07-19 | 2000-02-08 | Xilinx, Inc. | Data processing system using a flash reconfigurable logic device as a dynamic execution unit for a sequence of instructions |
US5859544A (en) * | 1996-09-05 | 1999-01-12 | Altera Corporation | Dynamic configurable elements for programmable logic devices |
US6178494B1 (en) * | 1996-09-23 | 2001-01-23 | Virtual Computer Corporation | Modular, hybrid processor and method for producing a modular, hybrid processor |
US6167486A (en) * | 1996-11-18 | 2000-12-26 | Nec Electronics, Inc. | Parallel access virtual channel memory system with cacheable channels |
US5860119A (en) * | 1996-11-25 | 1999-01-12 | Vlsi Technology, Inc. | Data-packet fifo buffer system with end-of-packet flags |
DE19654593A1 (en) * | 1996-12-20 | 1998-07-02 | Pact Inf Tech Gmbh | Reconfiguration procedure for programmable blocks at runtime |
US6338106B1 (en) * | 1996-12-20 | 2002-01-08 | Pact Gmbh | I/O and memory bus system for DFPS and units with two or multi-dimensional programmable cell architectures |
DE19654595A1 (en) * | 1996-12-20 | 1998-07-02 | Pact Inf Tech Gmbh | I0 and memory bus system for DFPs as well as building blocks with two- or multi-dimensional programmable cell structures |
DE19704044A1 (en) * | 1997-02-04 | 1998-08-13 | Pact Inf Tech Gmbh | Address generation with systems having programmable modules |
US5865239A (en) * | 1997-02-05 | 1999-02-02 | Micropump, Inc. | Method for making herringbone gears |
DE19704728A1 (en) * | 1997-02-08 | 1998-08-13 | Pact Inf Tech Gmbh | Method for self-synchronization of configurable elements of a programmable module |
US5857097A (en) * | 1997-03-10 | 1999-01-05 | Digital Equipment Corporation | Method for identifying reasons for dynamic stall cycles during the execution of a program |
US5884075A (en) * | 1997-03-10 | 1999-03-16 | Compaq Computer Corporation | Conflict resolution using self-contained virtual devices |
US6272257B1 (en) * | 1997-04-30 | 2001-08-07 | Canon Kabushiki Kaisha | Decoder of variable length codes |
US6035371A (en) * | 1997-05-28 | 2000-03-07 | 3Com Corporation | Method and apparatus for addressing a static random access memory device based on signals for addressing a dynamic memory access device |
US6011407A (en) * | 1997-06-13 | 2000-01-04 | Xilinx, Inc. | Field programmable gate array with dedicated computer bus interface and method for configuring both |
US5966534A (en) * | 1997-06-27 | 1999-10-12 | Cooke; Laurence H. | Method for compiling high level programming languages into an integrated processor with reconfigurable logic |
US6020760A (en) * | 1997-07-16 | 2000-02-01 | Altera Corporation | I/O buffer circuit with pin multiplexing |
US6170051B1 (en) * | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6026478A (en) * | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
US6038656A (en) * | 1997-09-12 | 2000-03-14 | California Institute Of Technology | Pipelined completion for asynchronous communication |
SG82587A1 (en) * | 1997-10-21 | 2001-08-21 | Sony Corp | Recording apparatus, recording method, playback apparatus, playback method, recording/playback apparatus, recording/playback method, presentation medium and recording medium |
JPH11147335A (en) * | 1997-11-18 | 1999-06-02 | Fuji Xerox Co Ltd | Plot process apparatus |
JP4197755B2 (en) * | 1997-11-19 | 2008-12-17 | 富士通株式会社 | Signal transmission system, receiver circuit of the signal transmission system, and semiconductor memory device to which the signal transmission system is applied |
DE69841256D1 (en) * | 1997-12-17 | 2009-12-10 | Panasonic Corp | Command masking for routing command streams to a processor |
DE69827589T2 (en) * | 1997-12-17 | 2005-11-03 | Elixent Ltd. | Configurable processing assembly and method of using this assembly to build a central processing unit |
DE19861088A1 (en) * | 1997-12-22 | 2000-02-10 | Pact Inf Tech Gmbh | Repairing integrated circuits by replacing subassemblies with substitutes |
US6172520B1 (en) * | 1997-12-30 | 2001-01-09 | Xilinx, Inc. | FPGA system with user-programmable configuration ports and method for reconfiguring the FPGA |
US6105106A (en) * | 1997-12-31 | 2000-08-15 | Micron Technology, Inc. | Computer system, memory device and shift register including a balanced switching circuit with series connected transfer gates which are selectively clocked for fast switching times |
US6034538A (en) * | 1998-01-21 | 2000-03-07 | Lucent Technologies Inc. | Virtual logic system for reconfigurable hardware |
US6198304B1 (en) * | 1998-02-23 | 2001-03-06 | Xilinx, Inc. | Programmable logic device |
DE19807872A1 (en) * | 1998-02-25 | 1999-08-26 | Pact Inf Tech Gmbh | Method of managing configuration data in data flow processors |
US6173419B1 (en) * | 1998-05-14 | 2001-01-09 | Advanced Technology Materials, Inc. | Field programmable gate array (FPGA) emulator for debugging software |
JP3123977B2 (en) * | 1998-06-04 | 2001-01-15 | 日本電気株式会社 | Programmable function block |
US6202182B1 (en) * | 1998-06-30 | 2001-03-13 | Lucent Technologies Inc. | Method and apparatus for testing field programmable gate arrays |
US6272594B1 (en) * | 1998-07-31 | 2001-08-07 | Hewlett-Packard Company | Method and apparatus for determining interleaving schemes in a computer system that supports multiple interleaving schemes |
US6137307A (en) * | 1998-08-04 | 2000-10-24 | Xilinx, Inc. | Structure and method for loading wide frames of data from a narrow input bus |
JP3551353B2 (en) * | 1998-10-02 | 2004-08-04 | 株式会社日立製作所 | Data relocation method |
US6044030A (en) * | 1998-12-21 | 2000-03-28 | Philips Electronics North America Corporation | FIFO unit with single pointer |
US6694434B1 (en) * | 1998-12-23 | 2004-02-17 | Entrust Technologies Limited | Method and apparatus for controlling program execution and program distribution |
US6381715B1 (en) * | 1998-12-31 | 2002-04-30 | Unisys Corporation | System and method for performing parallel initialization and testing of multiple memory banks and interfaces in a shared memory module |
US6191614B1 (en) * | 1999-04-05 | 2001-02-20 | Xilinx, Inc. | FPGA configuration circuit including bus-based CRC register |
US7007096B1 (en) * | 1999-05-12 | 2006-02-28 | Microsoft Corporation | Efficient splitting and mixing of streaming-data frames for processing through multiple processing modules |
US6211697B1 (en) * | 1999-05-25 | 2001-04-03 | Actel | Integrated circuit that includes a field-programmable gate array and a hard gate array having the same underlying structure |
US6347346B1 (en) * | 1999-06-30 | 2002-02-12 | Chameleon Systems, Inc. | Local memory unit system with global access for use on reconfigurable chips |
US6341318B1 (en) * | 1999-08-10 | 2002-01-22 | Chameleon Systems, Inc. | DMA data streaming |
US6204687B1 (en) * | 1999-08-13 | 2001-03-20 | Xilinx, Inc. | Method and structure for configuring FPGAS |
US6507947B1 (en) * | 1999-08-20 | 2003-01-14 | Hewlett-Packard Company | Programmatic synthesis of processor element arrays |
US6349346B1 (en) * | 1999-09-23 | 2002-02-19 | Chameleon Systems, Inc. | Control fabric unit including associated configuration memory and PSOP state machine adapted to provide configuration address to reconfigurable functional unit |
US6625654B1 (en) * | 1999-12-28 | 2003-09-23 | Intel Corporation | Thread signaling in multi-threaded network processor |
US6519674B1 (en) * | 2000-02-18 | 2003-02-11 | Chameleon Systems, Inc. | Configuration bits layout |
US6845445B2 (en) * | 2000-05-12 | 2005-01-18 | Pts Corporation | Methods and apparatus for power control in a scalable array of processor elements |
US6362650B1 (en) * | 2000-05-18 | 2002-03-26 | Xilinx, Inc. | Method and apparatus for incorporating a multiplier into an FPGA |
EP1342158B1 (en) * | 2000-06-13 | 2010-08-04 | Richter, Thomas | Pipeline configuration unit protocols and communication |
US6711407B1 (en) * | 2000-07-13 | 2004-03-23 | Motorola, Inc. | Array of processors architecture for a space-based network router |
DE60041444D1 (en) * | 2000-08-21 | 2009-03-12 | Texas Instruments Inc | microprocessor |
US6518787B1 (en) * | 2000-09-21 | 2003-02-11 | Triscend Corporation | Input/output architecture for efficient configuration of programmable input/output cells |
US6525678B1 (en) * | 2000-10-06 | 2003-02-25 | Altera Corporation | Configuring a programmable logic device |
US20040015899A1 (en) * | 2000-10-06 | 2004-01-22 | Frank May | Method for processing data |
US6636919B1 (en) * | 2000-10-16 | 2003-10-21 | Motorola, Inc. | Method for host protection during hot swap in a bridged, pipelined network |
US6493250B2 (en) * | 2000-12-28 | 2002-12-10 | Intel Corporation | Multi-tier point-to-point buffered memory interface |
US20020108021A1 (en) * | 2001-02-08 | 2002-08-08 | Syed Moinul I. | High performance cache and method for operating same |
US6847370B2 (en) * | 2001-02-20 | 2005-01-25 | 3D Labs, Inc., Ltd. | Planar byte memory organization with linear access |
US6976239B1 (en) * | 2001-06-12 | 2005-12-13 | Altera Corporation | Methods and apparatus for implementing parameterizable processors and peripherals |
JP3580785B2 (en) * | 2001-06-29 | 2004-10-27 | 株式会社半導体理工学研究センター | Look-up table, programmable logic circuit device having look-up table, and method of configuring look-up table |
US20030055861A1 (en) * | 2001-09-18 | 2003-03-20 | Lai Gary N. | Multipler unit in reconfigurable chip |
US20030052711A1 (en) * | 2001-09-19 | 2003-03-20 | Taylor Bradley L. | Despreader/correlator unit for use in reconfigurable chip |
US6757784B2 (en) * | 2001-09-28 | 2004-06-29 | Intel Corporation | Hiding refresh of memory and refresh-hidden memory |
US7000161B1 (en) * | 2001-10-15 | 2006-02-14 | Altera Corporation | Reconfigurable programmable logic system with configuration recovery mode |
US7873811B1 (en) * | 2003-03-10 | 2011-01-18 | The United States Of America As Represented By The United States Department Of Energy | Polymorphous computing fabric |
-
2003
- 2003-03-21 US US10/508,559 patent/US20060075211A1/en not_active Abandoned
- 2003-03-21 AU AU2003223892A patent/AU2003223892A1/en not_active Abandoned
- 2003-03-21 EP EP03720231A patent/EP1518186A2/en not_active Ceased
- 2003-03-21 WO PCT/DE2003/000942 patent/WO2003081454A2/en not_active Application Discontinuation
-
2010
- 2010-03-22 US US12/729,090 patent/US20100174868A1/en not_active Abandoned
-
2014
- 2014-11-13 US US14/540,782 patent/US20150074352A1/en not_active Abandoned
Non-Patent Citations (1)
Title |
---|
ZHIYUAN LI ET AL: "Configuration prefetching techniques for partial reconfigurable coprocessor with relocation and defragmentation", INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, XX, XX, 1 February 2002 (2002-02-01), pages 187 - 195, XP002349321 * |
Also Published As
Publication number | Publication date |
---|---|
AU2003223892A8 (en) | 2003-10-08 |
WO2003081454A2 (en) | 2003-10-02 |
US20060075211A1 (en) | 2006-04-06 |
WO2003081454A3 (en) | 2005-01-27 |
US20100174868A1 (en) | 2010-07-08 |
AU2003223892A1 (en) | 2003-10-08 |
US20150074352A1 (en) | 2015-03-12 |
WO2003081454A8 (en) | 2004-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2224330B1 (en) | Method and device for partitioning large computer programs | |
WO2003081454A2 (en) | Method and device for data processing | |
DE102018005181B4 (en) | PROCESSOR FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH PERFORMANCE, ACCURACY AND ENERGY REDUCTION CHARACTERISTICS | |
DE69826700T2 (en) | COMPUTER-ORIENTED DEVICE FOR PARALLEL COMPUTERIZATION, SIMULATION AND EXECUTION OF COMPUTER PROGRAMS AND HARDWARE MODELS | |
EP1228440B1 (en) | Sequence partitioning in cell structures | |
EP0961980B1 (en) | Method for self-synchronization of configurable elements of a programmable component | |
DE102018006735A1 (en) | Processor and method for configurable clock gating in a spatial array | |
EP1057117B1 (en) | METHOD FOR CACHEING CONFIGURATION DATA OF DATA FLOW PROCESSORS AND MODULES WITH A TWO- OR MULTIDIMENSIONAL PROGRAMMABLE CELL STRUCTURE (FPGAs, DPGAs OR SIMILAR) ACCORDING TO A HIERARCHY | |
EP1146432B1 (en) | Reconfiguration method for programmable components during runtime | |
DE69909829T2 (en) | MULTIPLE PROCESSOR FOR THREAD SOFTWARE APPLICATIONS | |
DE102018005216A1 (en) | Processors, methods and systems for a configurable spatial accelerator with transaction and repetition features | |
DE102018006791A1 (en) | Processors, methods and systems having a configurable spatial accelerator with a sequencer data flow operator | |
DE102018005169A1 (en) | PROCESSORS AND METHODS FOR CONFIGURABLE NETWORK-BASED DATA FLUID OPERATOR CIRCUITS | |
DE102018006889A1 (en) | Processors and methods for preferred layout in a spatial array | |
DE10028397A1 (en) | Registration method in operating a reconfigurable unit, involves evaluating acknowledgement signals of configurable cells with time offset to configuration | |
EP0943129A1 (en) | Unit for processing numeric and logical operations, for use in processors (cpus) and in multicomputer systems | |
DE19815865A1 (en) | Software compilation method for reconfigurable computer compiler | |
Nguyen et al. | PR-HMPSoC: A versatile partially reconfigurable heterogeneous Multiprocessor System-on-Chip for dynamic FPGA-based embedded systems | |
WO2003017095A2 (en) | Method for the translation of programs for reconfigurable architectures | |
WO2003060747A2 (en) | Reconfigurable processor | |
DE68928300T2 (en) | Method and apparatus for pipeline instruction execution | |
US20110161977A1 (en) | Method and device for data processing | |
EP1116129A2 (en) | Configurable hardware block | |
US20140143509A1 (en) | Method and device for data processing | |
EP1493084A2 (en) | Method for the translation of programs for reconfigurable architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
17P | Request for examination filed |
Effective date: 20050722 |
|
17Q | First examination report despatched |
Effective date: 20080507 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 12/08 20060101ALI20080428BHEP Ipc: G06F 9/445 20060101ALI20080428BHEP Ipc: G06F 15/78 20060101AFI20031007BHEP |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: KRASS, MAREN Owner name: RICHTER, THOMAS |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: PACT XPP TECHNOLOGIES AG |
|
111L | Licence recorded |
Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR Name of requester: XILINX, INC., US Effective date: 20141010 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20150609 |