WO2003007155A1 - Dispositif a circuit integre - Google Patents
Dispositif a circuit integre Download PDFInfo
- Publication number
- WO2003007155A1 WO2003007155A1 PCT/JP2002/007076 JP0207076W WO03007155A1 WO 2003007155 A1 WO2003007155 A1 WO 2003007155A1 JP 0207076 W JP0207076 W JP 0207076W WO 03007155 A1 WO03007155 A1 WO 03007155A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- memory
- output
- input
- address
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/781—On-chip cache; Off-chip memory
Definitions
- the present invention relates to an integrated circuit device capable of reconfiguring a data flow.
- a cache or cache memory When processing instructions (hereinafter referred to as data when there is no need to distinguish between instructions and data) using a CPU or the like, a relatively small-capacity but high-speed memory called a cache or cache memory is used. It uses the temporal locality or spatial locality of data to improve the access speed to data. Therefore, an integrated circuit device called a VLSI, system LSI, or system ASIC with a processor or processor core has a cache system with a cache memory and a circuit such as an MMU that controls the cache memory. .
- MMU Memory Management Unit
- TLB Translation Look-aside Buffer
- data in the cache memory is input / output to / from the CPU core. If there is no data in the cache memory, the virtual address is converted to a physical address by the MMU and TLB, input / output occurs to external memory, and the data in the cache memory is also updated. Therefore, the cache memory is configured to be transparent to the software running on the CPU core by the cache control mechanism equipped with the MMU. For this reason, software only needs to be developed to operate based on virtual addresses that do not depend on hardware, and the time and cost required for development and design can be reduced. In addition, the same software can be run on different hardware, making effective use of software assets it can.
- MMU Memory Management Unit
- TLB Translation Look-aside Buffer
- the cache memory simply becomes an overhead, which adversely affects the execution time of the program.
- technologies to separate the instruction cache from the data cache increase the number of layers in the cache, and prefetch mechanically or software are being studied.
- a multi-level cache is effective when there is a large difference in access time and storage capacity between the cache and external memory.
- the number of accesses to memory is inevitably increased by increasing the number of layers, so the possibility of overhead is always increased depending on conditions such as the software configuration and the input / output media of the data to be processed. is there.
- any technology can improve the cache memory hit rate if the conditions such as the software executed on the CPU and the media on which the data is stored match the cache memory method. is there.
- cache memory is hardware that is placed in between the external memory Therefore, if the processing content of the software to be executed and the environment of the hardware storing the data to be processed by the software are different, the expected cache efficiency may not be obtained, or the overhead may increase. This will cause the execution time of the processor to decrease.
- a processor specialized for a particular application may be able to use the optimal cache memory system. However, if a processor is aimed at a certain degree of versatility, a cache memory system that will not cause much overhead, if not very effective, will be introduced to utilize the cache memory. Therefore, even if there is a cache memory system, the performance will not improve much.
- FPGA Field Programmable Gate Array
- This is an integrated circuit device that can be used.
- integrated circuit devices that can change the structure of the data path using a medium-sized single-structured basic function unit that performs various processes according to the instruction set are being studied.
- the applicant of the present application has developed a processing unit including a plurality of types of dedicated processing elements each having an internal data path suitable for a different specific process, and a wiring group connecting these dedicated processing elements. I have.
- a circuit that controls the cache memory is configured by a part of the processing unit that can change or reconfigure these data flows. That is, in the integrated circuit device of the present invention, at least one data flow is formed with the first memory capable of inputting and / or outputting data with the second memory, and at least one of the data flows is formed.
- a processing unit that can be changed in part, the processing unit comprising: a data processing section for processing data input and / or output to and from the first memory; A first address output section for outputting a first address of data input and / or output to and from the processing section; and an input and Z or A second address output section for outputting a second address of data to be output.
- the first configuration can be performed by the hardware configuration of the data processing section or the software executed in the data processing section. It is possible to change the data flow of the address output section or the second address output section and control the output of each section. Therefore, a cache memory system most suitable for processing executed by the integrated circuit device can be configured in the integrated circuit device. Alternatively, the control circuit of the cache memory can be configured in the integrated circuit device so as to control the cache memory most suitable for the processing executed by the integrated circuit device.
- the first memory serving as the cache memory is converted into a second address for the second memory, that is, a physical address of data in the second memory, or a logical address or a physical address that can be converted to a physical address. It is also possible to control passively by virtual address. With this control, it is also possible to configure so that the first memory exists transparently to the second memory and / or the data processing section.
- the output section can actively control the input and output of data. It is also possible to control data input / output operations between the first and second memories in parallel with the data processing section and the first address output section. Therefore, the second address output section can be configured to determine the data access destination of the data processing section and the first address output section, and the cache is transparent to the conventional CPU. Instead of a cache, it is possible to configure a cache that controls the processing in the processing unit.
- the conventional cache architecture is uniform and transparent so that the execution speed can be improved on average for software running on a processing mechanism with a uniform hardware configuration such as a CPU core or DSP core. It is configured to provide an interface.
- a data processing section serving as a core is provided by an architecture such as an FPGA that can change the configuration of a data path itself.
- the configuration also allows the configuration to be dynamically changed to the optimal configuration for the data processing partition and the software running on it. Therefore, it is not necessary to be transparent in the uniform, and it is possible to provide a completely different interface or service to the data processing section, which is the core or the execution unit, from the conventional cache.
- the first memory can be most efficiently used as a cache according to the processing contents of the software executed by the processing unit and the hardware environment.
- a cache system can be configured so that a high hit rate can be obtained when executing various kinds of software, and an integrated circuit device in which the input / output of cache memory does not become an overhead when executing the software is provided. can do.
- the second address output section independently prefetches data depending on the remaining amount of data in the first memory. It is possible to Therefore, the data can be prefetched into the second memory serving as a cache by hardware or by software for controlling the second address output section without consuming the processing time of the data processing section.
- the first address output section outputs, as the first address, the address of the first memory, that is, the physical address of the first memory, or a virtual or logical address that can be converted to a physical address, From the second address output section, as a second address, the address of the second memory, that is, the physical address of the first memory, or the physical address of the first memory. Outputs a virtual or logical address that can be converted.
- the hardware or software can be configured so that processing proceeds at the address of the first memory serving as the cache memory.
- the second address output section be operable asynchronously, i.e., independently, of the data processing section and / or the first address output section, so that it is independent of the data processing section. Prefetching in parallel processing.
- the first memory is provided asynchronously, i.e., with a plurality of storage sections that can be input and output independently, for example, a plurality of memory banks, so that inputs and outputs to the second memory can be processed independently in parallel. It is desirable.
- the second address output section can output the second address based on the data in the first memory independently or in combination with the data processing section. Data processing by addressing can be performed without any restrictions.
- the first memory that operates as a cache includes a first input memory that stores data input to the data processing section and a first output memory that stores data output from the data processing section. It is desirable. Thereby, the input and output of data to the data flow formed in the data processing section can be controlled independently. Although the address of the first memory is output from the first address output section, there is no data corresponding to the first address in the first memory, or there is a space for storing data corresponding to the first address. Otherwise, it will be an obstacle to the processing of the data flow formed in the data processing section. Therefore, it is desirable to provide a first arbitration unit that manages the input and Z or output between the first memory and the data processing section.
- the first arbitration unit does not have data corresponding to the first address, or has no space for storing data corresponding to the first address, such as input or output conditions with the data processing section. If the condition is not satisfied, a function to output a stop signal to the data processing section can be provided.
- the data processing section is provided with a function to stop processing of at least one data path or data flow formed in the data processing section by a stop signal, thereby performing the first adjustment.
- the on / off of the data path or data flow can be controlled by the shutdown. Therefore, control can be easily realized such that the data path or the data flow formed in the data processing section is activated after the data to be processed is prepared.
- the first arbitration unit manages the transfer of data from the first input memory to the data processing section. It is desirable to have a first input arbitration unit and a first output arbitration unit that manages the transfer of data from the data processing section to the first output memory. Thereby, control of the data flow formed in the data processing section can be performed independently from the input side and the output side.
- the first arbitration unit has a function of independently managing each of the plurality of storage sections. It is possible. Each of the plurality of data flows formed in the data processing section can be independently controlled by the first arbitration unit according to the state of the corresponding storage section. On the other hand, the first arbitration unit can be provided with a function of managing a plurality of storage sections in association with each other. This allows the data port formed in the data processing section to preferentially process the data input from the external memory in the predetermined storage section, or outputs the data flow from the data flow to the external storage via the predetermined storage section. Control to give priority to memory can be easily realized.
- a plurality of first memories are provided, and the processing unit includes first and second address output sections corresponding to the respective first memories. Is desirably formed. This makes it possible to configure a multi-level cache by appropriately configuring the data processing section and the first address output section.
- a plurality of first memories may be selectively used as an instruction cache and a data cache. By using a plurality of first memories separately, the data cached in each first memory can be appropriately controlled by the second address output section.
- the second memory and the plurality of first It is desirable to provide a second arbitration unit that manages input and output to and from the memory, and the second address is supplied to the second arbitration unit.
- the integrated circuit device of the present invention can access the external memory as in the conventional case.
- the second memory can input and Z or output data between the third memory and the third memory.
- a third address output means for outputting a third address of data to be input and / or output to / from the second memory, thereby making the cache memory multi-layered.
- the third memory is an external memory
- a cache memory is constituted by the first and second memories.
- the third address output means may be a conventional cache control mechanism such as an MMU. It can also be configured in the same way as the address output section of the 4th or higher level memory (including not only ROM and RAM but also various types of recording media such as disks). The same applies when controlling.
- the processing unit capable of changing or reconfiguring the data flow is provided with a plurality of single types of logic elements whose functions can be changed and a group of wires connecting these logic elements, that is, the above-described FPGA and the like. Even if the data path structure or the data structure can be changed using a medium-sized single-structured basic function unit, it is acceptable. It is possible to employ a processing unit that includes a plurality of types of dedicated processing elements each having an internal data path suitable for a specific process different from each other, and a wiring group that connects these dedicated processing elements. With such a reconfigurable processing unit, it is possible to incorporate in advance a dedicated processing element having an internal data path suitable for outputting an address, thereby improving the processing efficiency for generating addresses. The processing speed can be further improved. Also, since the existence of extra circuit elements can be reduced, the elements to be selected for changing the data flow can be reduced, the AC characteristics can be improved, and the space efficiency can be increased.
- control unit that instructs at least a part of the data flow of the processing unit to configure the data processing section, the first address output section, and the second address output section to the processing unit.
- the process to instruct the data flow can be changed flexibly and dynamically in a short time.
- a compact and economical integrated circuit device having a flexible cache system can be provided.
- control unit can reconfigure the data flow by rewriting the contents of the configuration memory or instructing at least part of the data flow of the processing unit to change.
- this control unit allows asynchronous or independent changes in the data flow of the data processing section, the first address output section, or the second address output section. Can be instructed.
- the dedicated processing elements constituting the data processing section and / or the first address output section constitute another data flow for another purpose.
- Use the dedicated processing element of the second address output section to control a different memory, or use it for a different purpose while processing is performed in the data processing section. This makes it possible to utilize the resources of the processing unit flexibly and efficiently.
- FIG. 1 is a block diagram showing a schematic configuration of an integrated circuit device according to an embodiment of the present invention.
- FIG. 2 is a diagram showing a schematic configuration of the processing unit AAP.
- FIG. 3 is a diagram showing a schematic configuration of the matrix section.
- FIG. 4 shows an example of a data path section suitable for processing for outputting an end address.
- FIG. 5 is a diagram showing a configuration of an address generation circuit of the data path unit shown in FIG.
- FIG. 6 is a diagram showing a configuration of the counter shown in FIG.
- FIG. 7 is a diagram showing a configuration of an address generation circuit different from FIG.
- FIG. 8 is a diagram showing a state in which a large-capacity RAM is controlled as an external memory.
- FIG. 9 is a diagram illustrating a state in which a large-capacity RAM and peripheral devices are controlled as an external memory.
- FIG. 10 is a diagram showing how a plurality of large-capacity RAMs and peripheral devices are controlled as external memories.
- FIG. 11 is a diagram illustrating a state in which a large-capacity RAM is controlled as an external memory by a different integrated circuit device of the present invention.
- FIG. 1 shows a schematic configuration of a system LSI 10 according to the present invention.
- This LSI 10 has a general-purpose processor section (hereinafter referred to as a basic processor or processor) 11 that performs general-purpose processing including error processing based on an instruction set given by a program or the like, and a matrix-like configuration.
- a basic processor or processor a general-purpose processor section
- FIG. 1 shows a schematic configuration of a system LSI 10 according to the present invention.
- This LSI 10 has a general-purpose processor section (hereinafter referred to as a basic processor or processor) 11 that performs general-purpose processing including error processing based on an instruction set given by a program or the like, and a matrix-like configuration.
- AAP Advanced Application Processor unit or AAP unit (hereinafter AAP) 20 in which a data flow or a pseudo data port adapted to the specific data processing is formed in the barrier by the arranged arithmetic or logical elements,
- An interrupt controller 12 that controls the interrupt processing from the AAP 20, a clock generator 13 that supplies a clock signal for operation to the AAP 20, and an arithmetic circuit that can be provided by the LSI 10 FPGA section 14 to further enhance the flexibility of
- This is a data processing system including a control unit 15.
- the FPGA section 14 is an interface with an FPG A chip provided outside the LSI 10, and is hereinafter referred to as an off-chip FPGA or an FPGA.
- the basic processor 11 and the AAP 20 include a data bus 17 capable of exchanging data between the basic processor 11 and the AAP 20, and the basic processor 11 to the AAP 20. It is connected to an instruction bus 18 for controlling the configuration and operation of the device. Also, an interrupt signal is supplied from the AAP 20 to the interrupt control unit 12 via the signal line 19: ⁇ , when the processing in the AAP 20 is completed, or when an error occurs during the processing, the state of the AAP 20 is changed. Feedback can be provided to the basic processor 11.
- the AAP 20 and the FPGA 14 are also connected by the data bus 21, so that data is supplied from the AAP 20 to the FPGA 14 for processing, and the result can be returned to the AAP 20. Further, the AAP 20 is connected to the path control unit 15 by a load bus 22 and a storage bus 23, so that data can be exchanged with a data bus outside the LSI 10. Therefore, the AAP 20 can input data from the external DRAM 2 or another device, and can output the result of processing the data by the AAP 20 to the external device again.
- the basic processor 11 can also input and output data to and from external devices via the data bus 11a and the bus control unit 15.
- FIG. 2 shows an outline of the AAP unit 20.
- the AAP unit 20 of this example includes a matrix section 28 in which a plurality of logic blocks, logic units or logic elements (hereinafter, elements) for performing arithmetic and / or logic operations are arranged in a matrix, and the matrix section 28
- An input buffer 26 for supplying data and an output buffer 27 for storing data output from the matrix unit 28 are provided.
- Each of the input buffer 26 and the output buffer 27 includes four small-capacity input memories (RAMs) 26a to 26d and output memories (RAMs) 27a to 27d.
- the AAP 20 further includes an external access arbitration unit (the second access arbitration unit) that controls data input / output operations between the input buffer 26 and the output buffer 27 composed of the plurality of memories and the bus control unit 15.
- Mediation unit G the second access arbitration unit
- Each of the input RAMs 26a to 26d and the output RAMs 27a to 27d in this example functions as a 1-kbyte 2-port RAM, and is a 2-bank RAM with a 64-kbit width and a 512-byte depth. And can be used as 82. Therefore, by using different banks for input and output to the memory, it is possible to process input / output as independent operations.
- an arbitration unit (first arbitration unit) 85 that manages input and output to and from RAMs 81 and 82 is provided, so that the full and empty status of each bank can be checked by counting the number of inputs and outputs. .
- a plurality of types of control signals are exchanged between the matrix unit 28 and each RAM and arbitration unit 85. Is done.
- a 16-bit input read address data (ira, first address) 61 for controlling data read by the matrix unit 28 from the input RAMs 26a to 26d is provided. Is output.
- the input read address 61 is a logical or physical address of each of the input RAMs 26a to 26d.
- an input read address stop signal for controlling the supply of address data 61 from the arbitration unit 85 of each input 18 26 & to 26 d to the matrix unit 28 by full and / or empty. 62 is output.
- the arbitration unit 85 also outputs an input read address stop signal 62 when input conditions for the matrix unit 28 are not satisfied, such as when there is no data corresponding to the address data 61 supplied from the matrix unit 28.
- the data flow formed in the matrix section 28 is turned on / off by the stop signal 62. Therefore, in the execution process after the data flow is formed in the matrix section 28, the execution of the processing defined by the data flow can be controlled by the arbitration unit 85 of each of the input RAMs 26a to 26d. Therefore, if there is no data corresponding to the input read address data 61 in the input RAM 26, the processing of the data flow enters a wait state. If there is data corresponding to the input read address data 61 in the input RAM 26, the The input read data (ird) 63 is supplied to the matrix section 28, processed by the formed data flow, and output to one of the output RAMs 27.
- a stop signal (ird_s to) 64 for controlling the input read data 63 is output from the matrix section 28 to each of the input RAMs 26a to 26d, and the operation of the data port of the matrix section 28 is, for example, If it stops due to the output side, stop reading.
- the arbitration unit 85 for each input RAM 26a to 26d basically
- the arbitration unit 85 of the input RAMs 26a to 26d can be connected by wiring between the input RAMs 26a to 26d or by wiring through the matrix unit 28. It is also possible to associate and manage the RAMs 26a to 26d. By associating and managing a plurality of input RAMs 26a to 26d, a plurality of input RAMs can be allocated to a data flow formed in the matrix unit 28. Then, the arbitration unit 85 can prioritize the plurality of input RAMs 26a to 26d and supply the data to the data flow from the data of the RAM with the highest priority.
- 32-bit input write address data (for controlling data to be read from the external memory 2 via the bus control unit 15 and written to each of the input RAMs 26a to 26d) iwa, the second address) 65 and a 4-bit control signal (iwd—type) 66 that can specify the data type and the like are output from the matrix unit 28.
- the input write address data 65 and the control signal 66 corresponding to each of the input RAMs 26a to 26d are all output to the external access arbitration unit 25.
- the input write address 65 is a physical address of the RAM2, which is an external memory, or a logical or virtual address corresponding to the physical address.
- external access arbitration unit 25 A stop signal (i wa-stop) 67 for controlling the output of the address data 65 is supplied to the matrix 28.
- 64-bit input write data (iwd) 68 corresponding to the input write address data 65 supplied to the external access arbitration unit 25 is supplied from the arbitration unit 25 to each of the input RAMs 26a to 26d.
- a stop signal (iwd-stop) 69 for controlling the input write data 68 is output from the RAMs 26 a to 26 d to the external access arbitration unit 25.
- Address data ( OWA , first address) 71 is output.
- the output write address 71 is a logical or physical address of each output RAM 27a to 27d.
- an output write address stop signal (owa-stop) 72 for controlling the supply of address data 71 by a full load / empty unit is output from the arbitration unit 85 of each output RAM 27a to 27d to the matrix unit 28. Is done. That is, the arbitration unit 85 outputs the output write address stop signal 72 when the condition for receiving the output from the matrix section 28 is not satisfied.
- the data flow formed in the matrix unit 28 is turned on / off by the stop signal 72, and the execution of the process defined by the data flow is controlled. If the output RAM 27 has a space, 32-bit output write data (owd) 73 is output from the matrix section 28 together with the output write address data 71. Further, a stop signal (owd_s top) 74 for controlling the output write data 73 is supplied from the arbitration unit 85 of each of the output RAMs 27 a to 27 d to the matrix unit 28.
- 32-bit output read address data (32 bits) for controlling the data read from each input RAM 26 a to 26 d via the path control unit 15 and written to the external memory 2. ora, second address) 75, and a 4-bit control signal (ord-type) 76 capable of designating its data type and the like are output from the matrix unit 28.
- the output read address data 75 and the control signal 76 are all output to the external access arbitration unit 25.
- the output read address 75 is a physical address of the DRAM 2 which is an external memory, or a logical or virtual address corresponding to the physical address.
- a stop signal (or a_s top) 77 for controlling the output of the address data 75 from the external access arbitration unit 25 is supplied to the matrix 28.
- 64-bit output read data (ord) 78 is supplied from each output RAM 27a to 27d to the external access arbitration unit 25, and from the external access arbitration unit 25 to each output RAM 27a to A stop signal (ord-stop) 79 for controlling the output read data 68 is supplied to 27 d.
- the input data 63 of the matrix unit 28 is transferred to the bus control unit serving as an interface with the external memory 2 through the plurality of input RAMs 26 a to 26 d and the external access arbitration unit 25. Supplied from 15.
- the output data 73 of the matrix unit 28 is supplied to the bus control unit 15 serving as an interface with the external memory 2 via the plurality of output RAMs 27 a to 27 d and the external access arbitration unit 25. Since each of the input RAMs 26a to 26d and the output RAMs 27a to 27d has a two-bank configuration, processing between the input RAMs 26a to 26d and the output RAMs 27a to 27d and the matrix unit 28 is performed. And the processing between the input RAMs 26a to 26d and the output RAMs 27a to 27d and the external access arbitration unit 25, that is, the processing with the external RAM 2 can be executed independently or asynchronously and in parallel. .
- a load bus 22 and a storage bus 23 are connected between the external access arbitration unit 25 and the bus control unit 15 by a 32-bit address bus and a 256-bit data bus so that data can be input / output in block units at a high speed. Is configured. Then, an input address signal 22a and an output address signal 23a are transmitted via an input address bus, and input data 22b and output data 23b are transmitted via a data bus. In addition, a signal line for transmitting 5-bit commands 22 c and 23 c, a signal line for transmitting busy signals 22 d and 23 d of the bus control unit 15, and a ready signal 22 e of the bus control unit 15 are transmitted Signal lines are also provided. FIG.
- the matrix section 28 is a system capable of reconfiguring a data path or a data flow corresponding to the processing unit in the present invention.
- the matrix section 28 includes a plurality of arithmetic units 30, and the elements 30 are arranged in an array or a matrix such that the elements 30 constitute four lines in the vertical direction.
- the matrix section 28 includes a row wiring group 51 extending in the horizontal direction and a column wiring group 52 extending in the vertical direction, arranged between these elements 30.
- the column wiring group 52 is composed of a pair of wiring groups 52 X and 52 y separately arranged on the left and right of the arithmetic unit 30 arranged in the column direction. Data from 5 2 y is supplied to each element 30.
- a switching unit 55 is disposed, and any channel of the row wiring group 51 and any channel of the column wiring group 52 are arranged. You can switch to connect.
- Each of the switching units 55 has a configuration RAM for storing settings, and the row wiring is performed by rewriting the contents of the configuration RAM with data supplied from the processor unit 11.
- the connection between the group 51 and the column wiring group 52 can be dynamically and arbitrarily controlled. For this reason, in the matrix section 28 of the present example, the configuration of the data flow formed by connecting all or a part of the plurality of elements 30 by the wiring groups 51 and 52 can be dynamically adjusted arbitrarily. Can be changed.
- Each element 30 is composed of a set of selectors 3 1 for selecting input data from each of the group of lines 5 2 x and 5 2 y of 1 a, and the selected input data di X and diy And an internal data path section 32 that performs specific arithmetic and Z or logical operation processing on the data and outputs the result to the row wiring group 51 as output data do.
- elements 30 each having an internal data path section 32 for performing different processing for each row are arranged side by side.
- these rooster lines 51 and 52 are also provided with wires for transmitting carry signals.
- the carry signal can be used as a carry signal or a signal indicating true / false.
- arithmetic and logical operations are performed in each element 30. It is used for controlling or transmitting the result to other elements.
- the element 30 arranged in the first row has a data path unit 32 i suitable for receiving data from the input buffer 26.
- the load data path section (LD) 32 i does not require a logic gate if it only accepts data, receives data from the load bus 22 and outputs it to the row wiring group 51.
- the load data path section 32 i receives the stop signal 62 from the ram arbitration unit 85 of the input ram 26, the data path section 32 i of this data path section 32 i It has a function to stop processing of the data flow connected to element 30. Further, when the data flow connected to the element of the data path unit 32 i is stopped due to an internal factor of the matrix unit 28 or an output side factor, the arbitration unit 85 of the corresponding input RAM 26 is stopped. Output a stop signal 64.
- the element 30a arranged in the second row is an element for writing data from the external RAM 2 to each of the input RAMs 26a to 26d of the input buffer 26, and the second address Corresponds to the output section. Therefore, a data path unit 32a having an internal data path suitable for generating an address (second address) for block loading is provided.
- This data path section 32a is called a BLA (Back Ground Load Address Generator).
- FIG. 4 is an example of the data path section 32a, which includes an address generation circuit 38 including a counter and the like, and the address is output from the address generation circuit 38 as an output signal do.
- the output signal do is supplied to the data path unit 32 as it is or as an input signal dix or diy after being processed by another element 30 via the row wiring group 51 and the column wiring group 52.
- one of the supplied addresses is selected by the selector SEL, and is output as an input write address 65 from the matrix section 28 to the access arbitration unit 25 via the flip-flop FF.
- each counter 38a has a configuration in which an arithmetic operation unit ALU38c and a comparator 38d are combined, and the ALU38c has ADD, SUB, BIT shift, OR, XOR It can be set to perform operations that combine them. Therefore, it has a function as a function generating circuit that generates a value each time a clock comes.
- the function of this counter 3.8a can be set from the processor unit 11 via the configuration RAM39.
- control signal en of the ALU 38c can be set by the carry signal cy supplied from the other counter 38a, and the output of the comparator 38d can be transmitted to the other counter 38a as the carry signal cy.
- the state of the other counter 38a can be set according to the state of the counter 38a, and an arbitrary address can be generated.
- the control signal en of the counter 38a can be set by the carry signal cy supplied from the other element 30 or transmitted to the other element 30.
- the element (BLA) 30a that outputs the input write address 65 has a configuration suitable for address generation having the address generation circuit 38 as the internal data path 32a, and the processor 11 through the configuration RAM 39. It is possible to control the content of the address generation processing from, and further, the association with other elements 30 can be set freely.
- the plurality of counters 38a included in the BLA 32a are, for example, 32-bit counters, and generate addresses for DMA transfer from the external memory 2 to the RAMs 26a to 26b, which are local store buffers.
- the element 30b arranged in the third row of FIG. 3 includes a data path section 32b for generating an input read address 61 for loading desired data from each of the input RAMs 26a to 26d into the matrix section 28. Provided, corresponding to the first address output section.
- the data path section 32b is called an LDA (Load Address Generator).
- LDA Local Address Generator
- the configuration of the data path section 32b is such that the output address is 32 bits.
- the configuration is basically the same as that of the internal data path section 32a for generating an address described above, except that it is 16 bits. Therefore, the basic configuration of the data path section 32b is as shown in FIG.
- the address generation circuit 38 includes four 16-bit counters 38a, and generates an address for transferring data from the local store buffers RAM 26a to 26b to the matrix unit 28.
- the control signal e n of the counter 38 a can be set by a carry signal cy supplied from another element 30, and can be transmitted to the other element 30.
- the data is supplied from the input RAMs 26a to 26d to the matrix unit 28 by the input read address 61 output from the element 30, and the data is supplied to the matrix unit 28 and processed by the other logic and calculation elements constituting the matrix unit 28.
- the elements 30c arranged in the fourth and fifth rows have a data path (SMA) 32c suitable for arithmetic and logical operations.
- the data path unit 32c includes, for example, a shift circuit, a mask circuit, a logical operation unit ALU, and a configuration RAM 39 for setting an operation to be processed by the ALU. Therefore, according to the instruction written by the processor 11, the input data di and diy can be added or subtracted, compared, or ORed and ANDed, and the result is output as an output signal do. You.
- the element 30d arranged in the lower row has a data path (DEL) 32d suitable for processing for delaying the timing at which data is transmitted.
- DEL data path
- a data path composed of a combination of a plurality of selectors and flip-flops FF is prepared, and the path selected by the selector based on the data of the configuration RAM M39 is provided.
- the input signals dix and diy pass, they are output as output signals do X and doy with a delay of an arbitrary number of clocks.
- the element 30 e arranged in the lower row has a data path unit (MUL) 32 e suitable for multiplication processing including a multiplier and the like.
- a further different element 30 f includes an interface with the FPGA 14 prepared outside the matrix section 28.
- An element having a data path section 32 f for the face is also prepared, and the data can be supplied to the FPGA 14 and processed once, and then returned to the matrix section 28 to continue the processing.
- elements 30 g and 30 h having data paths 32 g and 32 h, respectively, suitable for generating addresses for storage are provided.
- These data path sections 32g and 32h have basically the same configuration as the data path sections 32b and 32a that generate addresses described above with reference to FIGS. 4 to 7.
- the element 30g including the data path section 32g is a first address output section, and outputs an output write address 71 for writing data output from the matrix 28 to the output RAMs 27a to 27d. Then, the data output from the data processing sequence constituted by the above-described respective types of elements 30 c to 30 f is written to the output RAMs 27 a to 27 d.
- the data path section 32 g is called a STA (Store address Generator) and has the same configuration as the LDA 32 b.
- the element 3Oh which is arranged below the element (STA) 30 g and has a data path section 32h, is a second address output section, and reads the data of the output RAMs 27a to 27d to read the data of the external RAM 2d. It outputs an output read address 75 for writing to the external RAM 2, and writes the data processed by the matrix unit 28 to the external RAM 2.
- the data path section 32h is called a BSA (Back Ground Store Address Generator) and has the same configuration as the BLA 32a.
- the elements 30 each having a data path section 32s suitable for outputting data for storage are arranged.
- the data path section 32s is called an ST, and can employ a data path section having substantially the same configuration as the arithmetic operation data path section 32c.
- the output data path unit 32 s receives the stop signal 74 from the arbitration circuit 85 of the output RAM 27, the output data path unit 32 s has a function of stopping the processing of the data flow connected to the output element 30.
- the matrix 28 of the present example includes an element 30a having an internal data path (BLA) 32a for generating an address for inputting (block loading) data from the external RAM 2 to the input RAMs 26a to 26d.
- BLA internal data path
- An internal data path (LDA) 32b for generating an address for inputting data from 6a to 26d to the matrix unit 28 is provided, and an element 30b is provided.
- STA internal data path
- It has an element 30 h with an internal data path (BSA) 32 h for generating an address for output to RAM 2 (block loading).
- BSA internal data path
- Each of these elements 30a, 30b, 30g, and 30h has a data path suitable for generating an address as described above, and also has the configuration or function of the data of the configuration RAM 39.
- connection environment between the matrix unit 28 and the other elements 30 can also be changed by changing the connections of the wiring groups 51 and 52. Therefore, it is possible to provide address generation data from the processor 11 or the other element 30 of the matrix unit 28, and flexibly control the timing of generating an address.
- each of the plurality of input RAMs 26a to 26d is an independent input / output storage section.
- the input RAMs 26a to 26d have a two-bank configuration, input and output for the input RAMs 26a to 26d can be performed in parallel, and data for the input RAMs 26a to 26d can be output in parallel.
- the input and output of the device can be performed very efficiently.
- Each of the output RAMs 27a to 27d is a storage section that can be input and output independently, and inputs and outputs to the individual RAMs 27a to 27d can be performed independently and in parallel. Therefore, in this system, data can be input / output to / from the RAMs 26a to 26d and 27a to 27d operating as a cache very efficiently.
- Matritas 28 in this example is basically composed of elements 30 a, 30 b, and 30 g each having a data path section 32 a, 32 b, 32 g, and 32 h suitable for generating an address. And 30 h, each operation of which is determined by instructions from the basic processor 11. That is, the instruction supplied from the basic processor 11 as the control unit via the control bus 28 to the RAM 26 a to 26 d and 27 a to 27 d as the first memories The circuit to be accessed is determined, and the circuit to access the DRAM 2 that is the main memory (second memory) is determined.
- circuits for controlling access to these memories are configured in a matrix, the operation of those circuits depends on the conditions inside the matrix 28, for example, the data flow configuration or processing results, and the matrix 28 It is extremely easy to directly or indirectly reflect the results of processing using other elements.
- Elements 30a, 30b, 30g, and 30h suitable for generating addresses, like other elements, are connected to other elements of the matrix section 28 by wiring 51 and 52. Can be wired freely. For this reason, the elements 30a, 30b, 30g and 30h are controlled by a data flow constituted by other elements which are data processing sections in the matrix section 28 or software executed in the data processing section. The output can be controlled by changing the parameters or processing details.
- the access method to the RAMs 26a to 26d and 27a to 27d which are the first memories that constitute the cache system, and the DRAM that is the main memory (the second memory)
- the access method to 2 can be flexibly determined according to the conditions inside the matrix 28, for example, the configuration of the data flow or the processing result.
- the matrix section 28 can be reconfigured under the control of the basic processor 11, the data paths and the internal data paths of the elements 30a, 30b, 30g and 30h that generate these addresses are provided. Functions can be dynamically reconfigured, and connections to other external elements can be dynamically reconfigured. Of course, connections within and between elements can be reconfigured inside the matrix section 28 It is also possible to bring in functions. Therefore, when the connection of the other elements 30 of the matrix section 28 is changed according to the processing executed in the matrix section 28 to reconstruct the data flow or the data path structure, the buffer composed of the input RAM is used.
- buffer 27 comprising output RAM 26 and output RAM.
- a data processing sequence is constituted by the other elements 30 in the matrix section 28, a data input structure suitable for the data processing sequence is realized, and data loading is started in advance, Even after the data processing sequence is reconfigured for other processing, processing that could not be considered in the past, such as maintaining the data output structure and continuing only data output, can be executed extremely flexibly.
- the processing of the first memory, RAM 26 and 27, and the second memory, DRAM 2 can be performed freely, depending on whether other elements are dependent on data flow or independent. It can be performed at any time.
- the element 30 a When the input RAM 26 a becomes empty, the element 30 a outputs the input write address 65 to write data from the RAM 2, and the element 3 Ob receives the input RAM 26 a. If there is data in 26a, it is also possible to perform a process of loading the data into the matrix unit 28. This makes it possible to move the elements 30a and 30b independently and in parallel. The data in the external RAM2 can be prefetched to the input RAM 26a without wasting time. In addition, if the element 30a controls the address for inputting data from the external RAM 2, in the data processing system composed of the element 30b and the matrix section 28, the processing can be performed only with the address of the internal RAM 26a. It is possible.
- a data flow type processing system is defined by a plurality of other elements 30 in the matrix section 28, the data processing can be performed in the matrix section 28 only with the data excluding the end address. is there.
- a virtual address is output from the data processing sequence of the matrix section 28, and is converted into the physical address of the input RAM 26a at element 30b to supply the data. If there is no data in the input RAM 26, the element 30a It is also possible to convert to the physical address of the external RAM 2 and load it from the external RAM 2.
- the element (BLA) 30a can also be configured to generate an address by data input from the input RAM 26b, and thereby load data from the external RAM 2 to the input RAM 26a. Therefore, independent of the data processing sequence configured in the matrix section 28, complete indirect addressing control can be performed only by a mechanism for processing input / output to / from the input RAM 26 or the output RAM 27. Further, by linking the plurality of input RAMs 26a to 26d, the output RAMs 27a to 27d, and the access arbitration unit 25, a cache structure having a plurality of hierarchical structures can be realized.
- the AAP 20 of this example four input RAMs 26a to 26d and four output RAMs 27a to 27d are provided corresponding to the arrangement of the elements 30 in four columns. Therefore, these input RAMs 26a to 26d and output RAMs 27a to 27d can be used for the matrix unit 28 as a cache memory individually corresponding to a plurality of data processing series constituted by other elements 30. For this reason, when a plurality of jobs or applications are executed in the matrix unit 28, the input RAMs 26a to 26d and the output RAMs 27a to 27d are used as the optimal caches for the jobs or applications. it can.
- the element 30 has four rows and the S row, the data processing series constituted by the element 30 is not limited to four rows.
- Matrix section 28 If the data processing sequence to be configured is three columns or less, by assigning a plurality of RAMs of the input RAMs 26a to 26d and the output RAMs 27a to 27d to one data processing sequence, The capacity of the cache memory can be increased. If the number of data processing lines is 5 or more, one RAM is allocated to a plurality of data processing lines as cache memory, but at the worst, it is a data processing line that shares RAM and multiplied by the current CPU core. Only the same situation occurs when the task is cached.
- the system LSI 10 which is an integrated circuit device or a processing device of the present invention, has a structure or assembly 29 having a matrix unit as a processing unit and a small-capacity RAM.
- the address output from the matrix unit to the external RAM 2 is supplied to the external RAM 2 via the arbitration circuit 25. Since the address generation mechanism that controls the input / output of the small-capacity RAM is implemented by a matrix part that can reconfigure the data flow, the architecture that controls the small-capacity RAM that functions as a cache memory is also reconfigured. It is possible, and the configuration can be changed to the optimal configuration for the software executed in the matrix section.
- a small-capacity RAM can be most efficiently used as a cache memory according to the processing content of the software to be executed and the hardware environment.
- the cache memory and the circuit that controls the cache memory can be configured so that a high hit rate can be obtained, and the input and output of the cache memory is an overhead when executing the software.
- An integrated circuit device or a processing device such as a system LSI or AS IC that cannot be provided can be provided.
- the external memory controllable by the system LSI 10, that is, the second memory is not limited to the RAM.
- the external memory for the input RAM or output RAM is not limited to RAM, ROM, or a recording device such as a hard disk device, and data can be input / output by specifying an address. All devices are included.
- the LSI 10 controls the large-capacity RAM 2 and the peripheral device 3 such as a printer / display as an external memory
- the element BL to be block-loaded in the matrix unit 28 is used.
- A30a and BSA30h a physical address assigned to peripheral device 3 may be generated.
- the arbitration circuit 25 may be provided in a multiplicity. Deformation is also possible. Further, the large capacity RAM 2 can be mounted inside the LI S 10, and a configuration in which the large capacity RAM 2 is used as a cache memory for the peripheral device 3 is also possible. It is also possible to use the large-capacity RAM 2 as the code RAM of the processor 11.
- the configuration of the matrix section 28 described above is an example, and the present invention is not limited to this.
- the elements described above for the specific internal data path 32 that performs the operation are examples of those having a data path suitable for specific processing such as address generation, arithmetic operation, logical operation, multiplication, and delay.
- the function and configuration are not limited to this example.
- the data flow is changed by arranging elements having a data path of a function suitable for an application executed by the LSI 10 which is the integrated circuit device or the data processing device of the present invention in a matrix or array.
- a reconfigurable processing unit can be provided.
- the matrix section 28 may be plural, and by arranging a plurality of matrix sections in a plane or three-dimensionally, it is possible to construct an integrated circuit device having a larger number of elements. Is possible.
- the integrated circuit device of the present invention is not limited to an electronic circuit, but can be applied to an optical circuit or an optoelectronic circuit.
- the present invention is described by an example in which the AA P 20, the basic processor 11, and the bus control unit 15 are incorporated and provided as a system LSI 10, but which range is provided as one chip It depends on conditions such as the application to be implemented.
- the AAP 20 can be provided as a single chip, or the range 29 including the RAMs 26 and 27 serving as caches and the matrix section 28 can be formed into chips. Further, it is also possible to provide a larger system LSI or ASIC including a plurality of AAPs or other dedicated circuits in addition to the basic processor 15.
- a processing unit replacing the FPGA with the matrix unit 28 is used.
- FPGA is an architecture that can change the data path structure with versatility at the transistor level.
- the integrated circuit device which has been studied is also being studied.
- the first and second address outputs of the present invention that allow the input RAM 26 and the output RAM 27 to function as a cache in addition to the data processing section for a processing unit configured with such an architecture.
- the integrated circuit device or the processing device of the present invention can be realized.
- the architecture based on the matrix unit described above has different types of internal data paths and different elements. Therefore, it is not an architecture that requires the versatility of transistor transistors, so the mounting density can be improved and a compact and economical system can be provided.
- Each element 30 is provided with a data path section 32 dedicated to specific data processing, so that a redundant configuration can be reduced as much as possible, and a basic functional unit of an FPGA or other single configuration is provided.
- the processing speed can be greatly increased and the AC characteristics can be improved as compared with the processing unit in which the cells are arranged. Also, space efficiency is improved, so a compact layout can be adopted and the wiring length can be reduced. Therefore, a processing device that is optimal for an integrated circuit device and a processing device that can reliably utilize the efficient cache structure disclosed in the present invention and that can perform high-speed processing can be provided at low cost.
- the combination of elements 30 with a data path section 32 suitable for specific processing is changed in advance, so that the data processing unit can be implemented in almost one clock in a short time. That is, there is also an advantage that the configuration and function of the data processing sequence configured in the matrix unit 28 can be changed.
- the data path section 3.2 The functions of the selectors and logic gates such as the ALU can also be set independently by the processor 11 via the conformation memory 39, and the data path section 32 of each element 30 is It can be changed flexibly within the range of the function to be served. For this reason, the range of functions that can be executed in the data flow type data processing in the matrix section 28 of this example is very wide.
- the first address output section and the second address output section that control the first memory that can be used as a cache memory and the processing unit that can change the data flow are formed. .
- the configuration of the cache system can be dynamically changed to the configuration of the data processing partition and the configuration that is optimal for the software to be executed in the cache system, and a high hit ratio can be obtained when executing various software.
- the processing unit and the integrated circuit device of the present invention can be provided as a system LSI or ASIC capable of executing various data processing. Further, the processing unit and the integrated circuit device of the present invention are not limited to electronic circuits, but can be applied to optical circuits or optoelectronic circuits. Since the integrated circuit device of the present invention can execute data processing at high speed by reconfigurable hardware, it is suitable for a data processing device requiring high-speed and real-time properties such as network processing and image processing. is there.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Human Computer Interaction (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Logic Circuits (AREA)
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002318809A AU2002318809B2 (en) | 2001-07-12 | 2002-07-11 | Integrated circuit device |
KR1020047000422A KR100912437B1 (ko) | 2001-07-12 | 2002-07-11 | 집적회로장치 |
US10/363,885 US6868017B2 (en) | 2001-07-12 | 2002-07-11 | Integrated circuit device |
CA002451003A CA2451003A1 (en) | 2001-07-12 | 2002-07-11 | Integrated circuit device |
EP02745985A EP1416388A4 (en) | 2001-07-12 | 2002-07-11 | INTEGRATED CIRCUIT DEVICE |
JP2003512850A JP4188233B2 (ja) | 2001-07-12 | 2002-07-11 | 集積回路装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001212545 | 2001-07-12 | ||
JP2001-212545 | 2001-07-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003007155A1 true WO2003007155A1 (fr) | 2003-01-23 |
Family
ID=19047692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2002/007076 WO2003007155A1 (fr) | 2001-07-12 | 2002-07-11 | Dispositif a circuit integre |
Country Status (9)
Country | Link |
---|---|
US (1) | US6868017B2 (ja) |
EP (1) | EP1416388A4 (ja) |
JP (1) | JP4188233B2 (ja) |
KR (1) | KR100912437B1 (ja) |
CN (1) | CN1526100A (ja) |
AU (1) | AU2002318809B2 (ja) |
CA (1) | CA2451003A1 (ja) |
TW (1) | TW577020B (ja) |
WO (1) | WO2003007155A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006164186A (ja) * | 2004-12-10 | 2006-06-22 | Ip Flex Kk | 集積回路のデバッグ方法、デバッグプログラム |
US7403235B2 (en) | 2003-01-24 | 2008-07-22 | Sony Corporation | Integrated circuit and information signal processing apparatus having multiple processing portions |
US7908453B2 (en) | 2004-06-30 | 2011-03-15 | Fujitsu Semiconductor Limited | Semiconductor device having a dynamically reconfigurable circuit configuration |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6993674B2 (en) * | 2001-12-27 | 2006-01-31 | Pacific Design, Inc. | System LSI architecture and method for controlling the clock of a data processing system through the use of instructions |
US7197620B1 (en) * | 2002-12-10 | 2007-03-27 | Unisys Corporation | Sparse matrix paging system |
JP2005018626A (ja) | 2003-06-27 | 2005-01-20 | Ip Flex Kk | 並列処理システムの生成方法 |
US20050283550A1 (en) * | 2004-06-18 | 2005-12-22 | Honeywell International Inc. | Method and architecture of a coupling system for microprocessors and logic devices |
US7746846B2 (en) * | 2004-07-15 | 2010-06-29 | Broadcom Corporation | Method and system for a gigabit Ethernet IP telephone chip with integrated security module |
US7493578B1 (en) * | 2005-03-18 | 2009-02-17 | Xilinx, Inc. | Correlation of data from design analysis tools with design blocks in a high-level modeling system |
US7496869B1 (en) | 2005-10-04 | 2009-02-24 | Xilinx, Inc. | Method and apparatus for implementing a program language description of a circuit design for an integrated circuit |
US7363599B1 (en) | 2005-10-04 | 2008-04-22 | Xilinx, Inc. | Method and system for matching a hierarchical identifier |
US8402409B1 (en) | 2006-03-10 | 2013-03-19 | Xilinx, Inc. | Method and apparatus for supporting run-time reconfiguration in a programmable logic integrated circuit |
US7380232B1 (en) | 2006-03-10 | 2008-05-27 | Xilinx, Inc. | Method and apparatus for designing a system for implementation in a programmable logic device |
US7761272B1 (en) | 2006-03-10 | 2010-07-20 | Xilinx, Inc. | Method and apparatus for processing a dataflow description of a digital processing system |
JP5605975B2 (ja) * | 2007-06-04 | 2014-10-15 | ピーエスフォー ルクスコ エスエイアールエル | 半導体装置及びその製造方法、並びに、データ処理システム |
CN101727433B (zh) * | 2008-10-20 | 2012-04-25 | 北京大学深圳研究生院 | 一种处理器结构 |
CN101727434B (zh) * | 2008-10-20 | 2012-06-13 | 北京大学深圳研究生院 | 一种特定应用算法专用集成电路结构 |
KR101581882B1 (ko) * | 2009-04-20 | 2015-12-31 | 삼성전자주식회사 | 재구성 가능한 프로세서 및 그 재구성 방법 |
US8134927B2 (en) * | 2009-07-31 | 2012-03-13 | Ixia | Apparatus and methods for capturing data packets from a network |
US9270542B2 (en) | 2009-07-31 | 2016-02-23 | Ixia | Apparatus and methods for forwarding data packets captured from a network |
WO2011066459A2 (en) * | 2009-11-25 | 2011-06-03 | Howard University | Multiple-memory application-specific digital signal processor |
EP2561645B1 (en) | 2010-04-23 | 2020-02-26 | Keysight Technologies Singapore (Sales) Pte. Ltd. | Integrated network data collection arrangement |
US8869123B2 (en) | 2011-06-24 | 2014-10-21 | Robert Keith Mykland | System and method for applying a sequence of operations code to program configurable logic circuitry |
US9158544B2 (en) | 2011-06-24 | 2015-10-13 | Robert Keith Mykland | System and method for performing a branch object conversion to program configurable logic circuitry |
US10089277B2 (en) | 2011-06-24 | 2018-10-02 | Robert Keith Mykland | Configurable circuit array |
US9304770B2 (en) | 2011-11-21 | 2016-04-05 | Robert Keith Mykland | Method and system adapted for converting software constructs into resources for implementation by a dynamically reconfigurable processor |
US9633160B2 (en) | 2012-06-11 | 2017-04-25 | Robert Keith Mykland | Method of placement and routing in a reconfiguration of a dynamically reconfigurable processor |
US10904075B2 (en) | 2012-07-02 | 2021-01-26 | Keysight Technologies Singapore (Sales) Pte. Ltd. | Preconfigured filters, dynamic updates and cloud based configurations in a network access switch |
US9081686B2 (en) * | 2012-11-19 | 2015-07-14 | Vmware, Inc. | Coordinated hypervisor staging of I/O data for storage devices on external cache devices |
KR20150127608A (ko) * | 2013-03-01 | 2015-11-17 | 아토나프 가부시키가이샤 | 데이터 처리 장치 및 그 제어 방법 |
US9275203B1 (en) | 2014-02-03 | 2016-03-01 | Purdue Research Foundation | Methods, systems, and computer readable media for preventing software piracy and protecting digital documents using same |
US9967150B2 (en) | 2014-04-30 | 2018-05-08 | Keysight Technologies Singapore (Holdings) Pte. Ltd. | Methods and apparatuses for implementing network visibility infrastructure |
US9571296B2 (en) | 2014-04-30 | 2017-02-14 | Ixia | Methods and apparatuses for abstracting filters in a network visibility infrastructure |
US10404459B2 (en) * | 2017-02-09 | 2019-09-03 | Intel Corporation | Technologies for elliptic curve cryptography hardware acceleration |
EP3796145B1 (en) * | 2019-09-19 | 2024-07-03 | MyScript | A method and correspond device for selecting graphical objects |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS649548A (en) * | 1987-07-01 | 1989-01-12 | Nec Corp | Cache memory device |
JPH01273132A (ja) * | 1988-04-25 | 1989-11-01 | Nec Corp | マイクロプロセッサ |
JPH11143774A (ja) * | 1997-11-06 | 1999-05-28 | Hitachi Ltd | キャッシュ制御機構 |
JP2002163150A (ja) * | 2000-11-28 | 2002-06-07 | Toshiba Corp | プロセッサ |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE4129614C2 (de) * | 1990-09-07 | 2002-03-21 | Hitachi Ltd | System und Verfahren zur Datenverarbeitung |
JP3106998B2 (ja) * | 1997-04-11 | 2000-11-06 | 日本電気株式会社 | メモリ付加型プログラマブルロジックlsi |
US6438737B1 (en) * | 2000-02-15 | 2002-08-20 | Intel Corporation | Reconfigurable logic for a computer |
US6417691B1 (en) * | 2000-08-29 | 2002-07-09 | Motorola, Inc. | Communication device with configurable module interface |
-
2002
- 2002-07-11 TW TW091115475A patent/TW577020B/zh not_active IP Right Cessation
- 2002-07-11 WO PCT/JP2002/007076 patent/WO2003007155A1/ja not_active Application Discontinuation
- 2002-07-11 JP JP2003512850A patent/JP4188233B2/ja not_active Expired - Lifetime
- 2002-07-11 US US10/363,885 patent/US6868017B2/en not_active Expired - Lifetime
- 2002-07-11 CN CNA028137671A patent/CN1526100A/zh active Pending
- 2002-07-11 KR KR1020047000422A patent/KR100912437B1/ko active IP Right Grant
- 2002-07-11 EP EP02745985A patent/EP1416388A4/en not_active Withdrawn
- 2002-07-11 CA CA002451003A patent/CA2451003A1/en not_active Abandoned
- 2002-07-11 AU AU2002318809A patent/AU2002318809B2/en not_active Ceased
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS649548A (en) * | 1987-07-01 | 1989-01-12 | Nec Corp | Cache memory device |
JPH01273132A (ja) * | 1988-04-25 | 1989-11-01 | Nec Corp | マイクロプロセッサ |
JPH11143774A (ja) * | 1997-11-06 | 1999-05-28 | Hitachi Ltd | キャッシュ制御機構 |
JP2002163150A (ja) * | 2000-11-28 | 2002-06-07 | Toshiba Corp | プロセッサ |
Non-Patent Citations (3)
Title |
---|
COMPTON K.: "Reconfigurable computing: A survey of systems and software", ACM COMPUTING SURVEYS, vol. 34, no. 2, June 2002 (2002-06-01), pages 171 AND 210, XP002957662 * |
KIM H.S. ET AL.: "A reconfigurable multi-function computing cache architecture", PROCEEDINGS OF ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2000, pages 85 AND 94, XP000970735 * |
RANGANATHAN P. ET AL.: "Reconfigurable caches and their application to media processing", PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 27), June 2000 (2000-06-01), pages 214 AND 224, XP000928730 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7403235B2 (en) | 2003-01-24 | 2008-07-22 | Sony Corporation | Integrated circuit and information signal processing apparatus having multiple processing portions |
US7908453B2 (en) | 2004-06-30 | 2011-03-15 | Fujitsu Semiconductor Limited | Semiconductor device having a dynamically reconfigurable circuit configuration |
JP2006164186A (ja) * | 2004-12-10 | 2006-06-22 | Ip Flex Kk | 集積回路のデバッグ方法、デバッグプログラム |
JP4569284B2 (ja) * | 2004-12-10 | 2010-10-27 | 富士ゼロックス株式会社 | 集積回路のデバッグ方法、デバッグプログラム |
Also Published As
Publication number | Publication date |
---|---|
CN1526100A (zh) | 2004-09-01 |
US20040015613A1 (en) | 2004-01-22 |
KR100912437B1 (ko) | 2009-08-14 |
TW577020B (en) | 2004-02-21 |
JP4188233B2 (ja) | 2008-11-26 |
AU2002318809B2 (en) | 2008-02-28 |
US6868017B2 (en) | 2005-03-15 |
EP1416388A1 (en) | 2004-05-06 |
CA2451003A1 (en) | 2003-01-23 |
JPWO2003007155A1 (ja) | 2004-11-04 |
EP1416388A4 (en) | 2006-02-08 |
KR20040017291A (ko) | 2004-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2003007155A1 (fr) | Dispositif a circuit integre | |
Loh | 3D-stacked memory architectures for multi-core processors | |
JP3853736B2 (ja) | ユーザによる構成可能なオンチップメモリシステム | |
KR960016397B1 (ko) | 화일기억장치 및 그것을 사용한 정보처리장치 | |
CN104699631A (zh) | Gpdsp中多层次协同与共享的存储装置和访存方法 | |
JP4497184B2 (ja) | 集積装置およびそのレイアウト方法、並びにプログラム | |
JP2009514070A (ja) | 局所キャッシュとしてシフトレジスタを使用する論理シミュレーション用のハードウェア加速システム | |
US6988167B2 (en) | Cache system with DMA capabilities and method for operating same | |
US6101589A (en) | High performance shared cache | |
US20210382691A1 (en) | In-Memory Near-Data Approximate Acceleration | |
CN109891397A (zh) | 用于固态装置中的操作系统高速缓冲存储器的设备及方法 | |
US6594711B1 (en) | Method and apparatus for operating one or more caches in conjunction with direct memory access controller | |
US6606684B1 (en) | Multi-tiered memory bank having different data buffer sizes with a programmable bank select | |
TWI825853B (zh) | 可重組態資料處理器的缺陷修復電路 | |
US7765250B2 (en) | Data processor with internal memory structure for processing stream data | |
JPH1097464A (ja) | 情報処理システム | |
US6240487B1 (en) | Integrated cache buffers | |
Paul et al. | Energy-efficient hardware acceleration through computing in the memory | |
US20020108021A1 (en) | High performance cache and method for operating same | |
JPH0438014B2 (ja) | ||
US20230100573A1 (en) | Memory device, memory device operating method, and electronic device including memory device | |
JP3952856B2 (ja) | キャッシュ方法 | |
Hussain et al. | PVMC: Programmable vector memory controller | |
US6430651B1 (en) | Memory device for constituting a memory subsystem of a data processing apparatus | |
TWI828052B (zh) | 基於晶體堆疊架構的計算機系統和記憶體管理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) |
Free format text: EXCEPT/SAUF US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10363885 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003512850 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2115/DELNP/2003 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002745985 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2451003 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002318809 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20028137671 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020047000422 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2002745985 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2002745985 Country of ref document: EP |