WO2009136402A2 - Register file system and method thereof for enabling a substantially direct memory access - Google Patents
Register file system and method thereof for enabling a substantially direct memory access Download PDFInfo
- Publication number
- WO2009136402A2 WO2009136402A2 PCT/IL2009/000472 IL2009000472W WO2009136402A2 WO 2009136402 A2 WO2009136402 A2 WO 2009136402A2 IL 2009000472 W IL2009000472 W IL 2009000472W WO 2009136402 A2 WO2009136402 A2 WO 2009136402A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- instruction
- address
- register file
- processing
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Definitions
- the present invention relates to data processing. More particularly, the present invention relates to providing a register file system and a method thereof for enabling a substantially direct access to memory means that are coupled to a processing unit, such as a CPU (Central Processing Unit), microprocessor, and the like.
- a processing unit such as a CPU (Central Processing Unit), microprocessor, and the like.
- Fetching means retrieving an instruction from the program memory, wherein the instruction is represented by a number or by a sequence of numbers.
- Instruction Register a register that stores a current instruction to be executed.
- the instruction register is provided within a processing unit, and is located in physical proximity to processing means, such as ALU (Arithmetic Logic Unit).
- ALU Arimetic Logic Unit
- Opcode an opcode (operation code) is the portion of a machine-language instruction that specifies an operation to be performed (e.g., addition, subtraction, and the like).
- an instruction operand is data/value or a pointer (address) to the data, on which (or by means of which) an operation/processing (e.g., addition, subtraction, and the like) has to be performed.
- an operation/processing e.g., addition, subtraction, and the like
- Register File is a storage unit located within the processing unit, such as the CPU. Generally, the register file is a combination of registers and combinatorial logic. Background of the Invention
- a conventional central processing unit operates by four steps: a) fetching; b) decoding (that involves reading data from the CPU register file; c) instruction executing; and d) writing back the result of said executing.
- the first step, fetching involves retrieving an instruction from the program memory (e.g., RAM (Random Access Memory)). Instruction location in the program memory is determined by a program counter, which keeps track of the CPU processing in the current program.
- the value of the program counter is incremented by the length of the instruction word in terms of memory units; also, for example, when a conventional JUMP or BRANCH command is received, the program counter value is changed accordingly.
- the instruction to be fetched must be retrieved from relatively slow memory (e.g., secondary memory) by means of a conventional Input/Output control unit, causing the CPU to stall while waiting for the instruction to be returned back to said CPU.
- the instruction that the CPU fetches from the memory is used to determine what the CPU has to do, thus the CPU cannot proceed processing until the instruction is fetched from the memory.
- the instruction is broken up into several portions to be processed by other CPU units (e.g, ALU).
- the way in which the numerical instruction value is interpreted, is defined by the CPU instruction set architecture (ISA).
- ISA CPU instruction set architecture
- a group of numbers in the instruction called an opcode (operation code) indicates which operation has to be performed.
- the remaining numbers in the instruction usually provide information required for that instruction (e.g., operands for the addition/subtraction operation).
- operands may be given as a constant value (called an "immediate" value).
- operands may be provided as addresses of corresponding values stored in a register file (that comprises a plurality of registers, e.g., 32 or 64 registers).
- the executing step is performed.
- the CPU performs the desired operation. If, for example, an addition operation is requested, the numbers to be added are provided to inputs of the Arithmetic Logic Unit (ALU), and the result (the final sum) will be provided at the ALU outputs.
- the ALU comprises a circuitry to perform simple arithmetic and logical operations on the inputs, such as addition/subtraction operations.
- the results of the executing step are "written back" to the register file or to CPU registers. After accomplishing the instruction execution and writing back the resulting data, the entire process repeats with the next instruction cycle, normally fetching the next-in- sequence instruction due to the incremented value in the program counter.
- the above four CPU steps have to be performed relatively fast.
- non-local memory means such as cache, on-board memory (e.g., DRAM (Dynamic Random Access Memory)), secondary memory and the like
- the access time is greatly increased leading to significant delays and to a waste of the valuable CPU processing resources. In turn, it greatly decreases CPU performance and consumes most of the CPU processing time.
- a conventional processing unit e.g., CPU
- uses a limited set of registers e.g., 32 or 64 registers
- the register file can be implemented in hardware by means of a plurality of electronic elements, such as latches, flip-flops, memory arrays, multi-port SRAM (Static Random Access Memory) and the like.
- this register file is a portion of the CPU, and it is located in physical proximity to the ALU (Arithmetic Logic Unit) of said CPU.
- ALU Arimetic Logic Unit
- One of the reasons for having a limited local register file is due to the limited size of the CPU program word, which usually contains pointers to 3 registers: one register (accessed via "source 1" input of the local register file) storing the first value be processed by the ALU, another register (accessed via "source 2" input of the local register file) storing the second value to be processed by said ALU, and the last register (destination register, accessed via "destination” input of the local register file) storing the result value of the ALU processing (e.g., the sum of the values stored within said sources 1 and 2). Since the CPU program word is limited (in terms of data bits), the number of bits allowed for each the above registers is low.
- VLIW Very Large Instruction Word
- the CPU register file relatively rarely reaches a capacity of 256 registers.
- Another reason for having a limited number of registers in the CPU register file is due to hardware limitations related to fast memory access and to capability of using a relatively large number of ports.
- a conventional ALU requires providing at least two read ports and one write port.
- a conventional system that implements CPU 5 also usually contains a memory controller (that comprises a MMU (Memory Management Unit)), various memory means (e.g., cache, SRAM, etc.), and different peripherals, such as cache controllers, interrupt controllers, timers, hardware accelerators, DMA engine, communication controllers (e.g., a USB controller) and the like.
- the memory controller controls the CPU access to a wide range of registers/memory means, such as internal CPU memories (program and data), on-chip memories (including, for example, cache memory), on-chip peripheral memories, and off-chip (device) memories.
- the CPU local register file is significantly limited in its size (e.g., contains only 32 registers), and the CPU memory mapped registers (to be accessed, for example, by CPU internal units, such as the ALU) are physically located outside the CPU local register file (e.g., cache, secondary memory, etc.).
- the CPU needs to generate LOAD commands for loading data by means of the memory controller from each of said memory mapped register (e.g., located off-CPU-chip (outside the CPU chip)) into registers of the CPU local register file.
- the CPU can manipulate said data (e.g., to perform data addition or data subtraction operations by means of its ALU unit). Then, the result is first stored in another register of the CPU local register file, and after that said result is conveyed to the corresponding memory mapped register (for example, non-local device/peripheral (located off-CPU-chip)) for updating it with a new data value - the result of ALU processing. For that, the CPU needs to generate at least one STORE command for storing said result within said non-local device/peripheral.
- a single ALU command (e.g., addition, subtraction, etc.) is related to processing of data located within at least two registers.
- the CPU needs to generate at least two separate LOAD commands (each in a single CPU clock cycle) for loading the data required for processing.
- LOAD or multi-LOAD command for loading data from an external register/memory means (device register, cache memory, etc.
- ALU data processing command for executing various operations (e.g., addition, subtraction);
- STORE command for writing back the result of ALU processing into the corresponding non-local register/memory means - for this, even if working in a pipeline and avoiding data hazards, such data processing takes at least three CPU clock cycles.
- the DMA Direct Memory Access
- CPU CPU peripherals
- DMA operations can be conducted in parallel with CPU operations.
- dedicated hardware is required and the DMA engines need to be configured and enabled by the CPU; further, it is applicable only when no data processing (or substantially negligible data processing) is required.
- Fig. IA is a schematic block-diagram 100 of a conventional processing unit, according to the prior art.
- an instruction that comprises an opcode and one or more operands
- the program memory e.g., RAM 140
- instruction register 105 is 32 bits long [0...31] bits, wherein the first six bits [0...5] of the instruction provided within said instruction register 105 are opcode (that defines the operation to be performed, e.g., addition, subtraction, etc.); bits [11...15] are an address of the destination register within a register file 106 (the address of a register in which the result of ALU processing will be stored); bits [16...20] are an address of the "first" (Source 1) register (within register file 106), the value of which has to be manipulated (processed); and bits [21...25] are an address of the "second" (Source 2) register (within register file 106), the value of which has also to be manipulated (for example, has to be added to the value of said "first” register). It should be noted that the rest of the instruction bits (with
- the addresses of the above Sources 1 and 2 are inputted into decoder(s) 120' of register file 106, and as a result, the data of corresponding registers (to which said addresses are related) of said register file 106 is outputted over data bus (one or more lines) 141.
- the next step is based on the specific instruction to be processed, and can be, for example: a) reading data from on-CPU-chip (inside the CPU chip) memory/peripherals, or off-CPU-chip (outside the CPU chip) memory/peripherals (by establishing a LOAD command); b) storing data within said memory/peripherals (by establishing of a STORE command); and/or c) activating execution unit 130 (e.g., ALU) for performing a mathematical operation, such as addition, subtraction, multiplication, division: in this case, the operands for the execution unit processing are determined by means of control unit 115.
- execution unit 130 e.g., ALU
- the result is written back into the destination register within said register file 106 (the destination register address is defined by bits [11...15] of the executed instruction). Further, the result can be written back into the CPU memory means/peripherals 160 by means of Input/Output Control Unit 150 over bus 108 (by accomplishing a STORE command). Then, the cycle is started over with the next instruction to be further fetched, decoded and executed. Since a program counter 110 holds an address of the current instruction to be executed (and points to a corresponding RAM 140 memory address by means of address bus 119), the CPU always "knows" wherein within said RAM 140 the next instruction can be found. Each time the instruction is completed, program counter 110 is incremented by at least one memory address location; also, for example, when the instruction is a conventional JUMP or BRACH command, the program counter is changed accordingly.
- CPU register file 106 is local, and it is a portion of CPU chip (core).
- I/O control unit 150 e.g., comprising memory controller or memory management unit (MMU)
- MMU memory management unit
- ALU operations are not performed directly on the data stored within CPU mapped peripherals/memory means, and these peripherals/memory means are not accessed directly by means of said ALU 130: the data inputted into the ALU is incoming from local register file 106, to which it is loaded from corresponding memory/peripherals by means of Input/Output (I/O) control unit 150, for example. Therefore, for performing manipulation on data stored outside local register file 106, the data has first to be loaded into said local register file 106 by means of I/O control unit 150, thereby executing a LOAD command, and loading the data into the CPU local register file over load/store bus 108.
- I/O control unit 150 Input/Output
- control unit 115 (over control bus 121), which can comprise a controller 126, multiplexers 125, decoders 120" and the like.
- Control unit 115 receives data to be processed from local register file 106 over data bus 141, and it controls execution unit 130 processing by sending to said execution unit 130 a control signal over bus 121 in accordance with the instruction opcode.
- execution unit 130 receives the corresponding instruction operands to be processed from said control unit 115, and outputs a result of said processing over bus 108.
- Fig. IB is a schematic illustration of a conventional (local) register file 106, according to the prior art.
- instruction register (IR) 105 (Fig. IA) is 32 bits long [0...31] bits, wherein bits [11...15] are an address of the destination register within register file 106 (the address of a register in which the result of ALU 130 (Fig.
- bits [16...20] are an address of the "first” (Source 1) register (within register file 106), the value of which has to be manipulated; and bits [21...25] are an address of the "second" (Source 2) register (within register file 106), the value of which has also to be manipulated (for example, has to be added to the value of said "first” register).
- bits [21...25] are an address of the "second" (Source 2) register (within register file 106), the value of which has also to be manipulated (for example, has to be added to the value of said "first” register).
- the rest of the instruction bits (within 32 bits of said instruction) can be related to various data, such as an "immediate" value, auto-increment, etc.
- the addresses of the above Sources 1 and 2 are inputted into register file 106 (IR46-20 and IR 21-25 , respectively) and conveyed to decoders 120 (Fig. IA). Decoders 120 decode the addresses and enable outputting data of corresponding registers of register file 106 towards ALU for further processing said data (e.g., addition, subtraction of the data and the like). Thus, the data is outputted through "Source 1 Data” and “Source 2 Data” outputs, having a length of 32 bits. After ALU 130 processes said data, it stores the result (32 bits long) in a destination register within register file 106 (the destination register is defined by the IR 11-15 address).
- US 6,178,482 discloses a system embedded with a processor, containing sets of cache lines for accessing cache memories, which are dynamically operated as different register sets for supplying source operands and in turn, accepting destination operands for instruction execution.
- the different register sets may be of the same or of different virtual register files, and if the different register sets are of different virtual register files, the different virtual register files may be of the same or of different architectures.
- the cache memories may be directly accessed by using cache addresses.
- US 6,178,482 presents a data processing apparatus which uses a register file to provide a faster alternative to indirect memory addressing.
- a functional unit is connected to a data register file which comprises a plurality of registers, each of which is accessed by a corresponding register number.
- the functional unit of US 6,178,482 can execute at least one indirect register access instruction that comprises an operand register number field.
- Instruction decode circuitry connected to the register file and the functional unit, is responsive to the indirect register access instruction to recall data stored in an operand register specified by the operand register number in the instruction, identify the recalled data as a register access number, and recall operand data from a data register corresponding to the register access number for use as an operand by the functional unit.
- the present invention has many advantages over the prior art.
- one advantage of the present invention is that it significantly reduces the number of instructions and CPU clock cycles required for manipulating/processing (e.g., performing addition, subtraction, data moving, data shifting operations and the like) memory mapped data by providing a substantially direct memory means access for one or more CPU execution units (for processing the data).
- the number of instructions and corresponding CPU clock cycles for processing the data can be reduced, for example, to a single instruction that takes a single CPU clock cycle.
- Another advantage of the present invention is that it can significantly expand the conventional CPU register file to the entire (complete) CPU memory map, thereby providing novel CPU architecture and enabling substantially direct memory access.
- the expanded register file of said CPU can be further shared with other CPUs, or with other internal/external (on-chip/off-chip) peripherals or devices.
- Still another advantage of the present invention is that it provides a method and system, in which for reducing the number of instructions and CPU clock cycles required for manipulating/processing memory mapped register data, there is substantially no need in changing the structure of the conventional CPU program word.
- Still another advantage of the present invention is that it eliminates the need in using conventional DMA engines.
- a further advantage of the present invention is that it provides a method and system, in which the size of external memory means of conventional processing devices (such as conventional cache or tightly coupled memories, as used in the prior art architectures) can be significantly reduced and/or the need for using the external memory means can be eliminated.
- conventional processing devices such as conventional cache or tightly coupled memories, as used in the prior art architectures
- Still a further advantage of the present invention is that it provides a method and system, in which CPU stalls (delays) are substantially prevented.
- the present invention relates to providing a register file system and a method thereof for enabling a substantially direct access to memory means that are coupled to a processing unit, such as a CPU (Central Processing Unit), microprocessor, and the like.
- a processing unit such as a CPU (Central Processing Unit), microprocessor, and the like.
- the register file system comprises: a) a plurality of data units, each comprising a plurality of memory cells that are assigned with memory data unit addresses; b) at least one address converter, connected to one or more data units, for receiving one or more mapped addresses and converting them into the memory data unit addresses, wherein at least one first memory data unit address is of one or more memory cells that store the data to be processed and at least one second memory data unit address is of one or more memory cells for storing a result of the data processing; and c) at least one output port for outputting said data to be processed from said one or more memory cells that correspond to said at least one first memory data unit address.
- the register file system further comprises at least one control input port configured to receive an opcode of an instruction to be processed.
- the one or more mapped addresses are provided within an instruction to be processed.
- the register file system further comprises at least one address generator for generating the at least one mapped address.
- an instruction to be processed comprises data based on which the at least one mapped address is generated.
- the register file system further comprises a control unit connected to the at least one control input port for receiving the opcode and enabling processing the instruction according to said opcode.
- each data unit is configured to receive a read-enable command for enabling reading data from its one or more corresponding memory cells.
- each data unit is configured to receive a write-enable command for enabling writing data into its one or more corresponding memory cells.
- the data units are selected from one or more of the following: a) peripherals; b) memory means; and c) registers.
- At least a portion the register file system is incorporated within a processing unit.
- the register file system is used by means of at least one processing unit.
- the register file system further comprises one or more execution units for processing at least a portion of the data outputted from said register file system.
- the one or more execution units process the data outputted from said register file system according to an instruction opcode.
- the execution unit is an Arithmetic Logic Unit.
- the register file system further comprises an instruction register for storing at least one instruction to be processed, said at least one instruction comprising an opcode and one or more operands.
- the register file system further comprises a program counter for providing an address of the next instruction to be processed.
- At least one data unit is shared between two or more processing units.
- register file system comprises: a) a plurality of data units, each comprising a plurality of memory cells that are assigned with memory data unit addresses, each data unit configured to: a.l. receive at least one mapped address; a.2. decode the received at least one mapped address and determine corresponding at least one memory data unit address; a.3. output data to be processed from one or more memory cells that correspond to said at least one memory data unit address; and a.4. store data within one or more memory cells that correspond to said at least one memory data unit address; and b) at least one output port for outputting said data to be processed from said one or more memory cells.
- the processing unit device comprises: a) a register file system, comprising: a.l. a plurality of data units, each comprising a plurality of memory cells that are assigned with memory data unit addresses; a.2. at least one address converter, connected to one or more data units, for receiving one or more mapped addresses and converting them into the memory data unit addresses, wherein at least one first memory data unit address is of one or more memory cells that store the data to be processed and at least one second memory data unit address is of one or more memory cells for storing a result of the data processing; and a.3. at least one output port for outputting said data to be processed from said one or more memory cells that correspond to said at least one first memory data unit address; and b) at least one execution unit for receiving said data outputted from said register file system and processing it.
- a register file system comprising: a.l. a plurality of data units, each comprising a plurality of memory cells that are assigned with memory data unit addresses; a.2. at least one address converter, connected to
- the processing unit device further comprises an instruction register for storing at least one instruction to be processed, said at least one instruction comprising an opcode and one or more operands.
- the register file system further comprises at least one control input port configured to receive an opcode of an instruction to be processed.
- the one or more operands are the mapped addresses.
- the processing unit device further comprises at least one address generator for generating the one or more mapped address.
- one or more mapped addresses are generated according to data provided within an instruction to be processed.
- the processing unit device further comprises a control unit connected to the at least one control input port for receiving the opcode and enabling processing the instruction according to said opcode.
- each data unit is configured to receive a read-enable command for enabling reading data from its one or more corresponding memory cells.
- each data unit is configured to receive a write-enable command for enabling writing data into its one or more corresponding memory cells.
- the data units are selected from one or more of the following: a) peripherals; b) memory means; and c) registers.
- the at least one execution unit processes the data outputted from the register file system according to the instruction opcode.
- the execution unit is an Arithmetic Logic Unit.
- the processing unit device further comprises a program counter for providing an address of the next instruction to be processed.
- a processing unit device comprises: a) a register file system, comprising: a.l. a plurality of data units, each comprising a plurality of memory cells that are assigned with memory data unit addresses, each data unit configured to: a.1.1. receive at least one mapped address; a.1.2. decode the received at least one mapped address and determine corresponding at least one memory data unit address; a.1.3. output data to be processed from one or more memory cells that correspond to said at least one memory data unit address; and a.1.4. store data within one or more memory cells that correspond to said at least one memory data unit address; and a.2. at least one output port for outputting said data to be processed from said one or more memory cells; and b) at least one execution unit for receiving said data outputted from said register file system and processing it.
- the method of processing a processing unit (PU) instruction comprises: a) conveying an address of a PU instruction to be processed into a program memory; b) fetching said PU instruction from said program memory; c) performing first decoding of said PU instruction and determining its opcode and one or more operands, wherein at least one of said operands is a PU mapped address; d) performing second decoding or converting the at least one PU mapped address into an address of one or more memory cells of a corresponding PU data unit, giving rise to a data unit address; e) enabling reading the data stored in the at least one data unit addresses; f) processing the read data; and g) writing back the result of said processing into the one or more memory cells within the corresponding PU data unit.
- method of processing a processing unit (PU) instruction comprises: a) conveying an address of a PU instruction to be processed into a program memory; b) fetching said PU instruction from said program memory; c) performing first decoding of said PU instruction and determining its opcode and one or more operands; d) generating a corresponding PU mapped address for the at least one operand; e) performing second decoding or converting each generated PU mapped address into an address of one or more memory cells of a corresponding PU data unit, giving rise to a data unit address; f) enabling reading the data stored in the data unit addresses; g) processing the read data; and h) writing back the result of said processing into the one or more memory cells within the corresponding PU data unit.
- Fig. IA is a schematic block-diagram of a conventional processing unit, according to the prior art
- Fig. IB is a schematic illustration of a conventional (local) register file, according to the prior art
- Fig. 2A is a schematic illustration of connecting a spread register file system to an instruction register and to an execution unit (such as ALU), according to an embodiment of the present invention
- Fig. 2B is another schematic illustration of connecting a spread register file system to an instruction register and to an execution unit (such as ALU), according to another embodiment of the present invention
- Fig. 3 is a schematic illustration of a spread register file system, according to an embodiment of the present invention.
- Fig. 4 is a pipeline representation of operating with a spread register file system, according to an embodiment of the present invention.
- swipe register file system or "SRF” system
- SRF read register file
- LRF local register file
- the entire (complete) CPU memory map can comprise local registers (e.g., local CPU register files), cache memories, tightly coupled memories, on-chip/off-chip peripherals/memories (or registers) and any other conventional memory means.
- CPU processing unit
- processing processing
- data operation such as data manipulation, data transfer, addition or subtraction of data and the like.
- Fig. 2A is a schematic illustration of connecting a spread register file system 206 to instruction register 105 and to execution unit 130 (such as ALU), according to an embodiment of the present invention.
- spread register file system 206 relates to the entire (complete) CPU mapped memories: cache memories, on-chip peripherals (e.g., RAM, SRAM), tightly coupled memories, on-board memories (e.g., DRAM), secondary memories (e.g., off-chip peripherals, hard disks, etc.), and any other memory means (e.g., CDs (Compact Discs), DVDs (Digital Versatile Discs), etc.).
- spread register file system 206 comprises conventional peripheral address converters, as presented in Fig. 3.
- Each peripheral address converter is used for converting a CPU memory mapped address to corresponding peripheral (device) address (e.g., the peripheral device can be a USB device, cache memory, RAM, SRAM, tightly coupled memory, DRAM, hard disks, CD, DVD, etc.).
- one peripheral address converter converts a CPU mapped address of "source 1" register/memory means to corresponding address of said register/memory means within the corresponding peripheral that actually stores said data (to be processed by means of execution unit 130, such as ALU); another peripheral address converter converts a CPU mapped address of "source 2" register/memory means that stores additional data to be processed by means of said execution unit 130; and still another one peripheral address converter - converts a CPU mapped address of "destination" register/memory means, in which a result of the above execution unit 130 processing (e.g., addition, subtraction) will be stored.
- the peripheral address converter can be implemented either in hardware and/or in software.
- instruction register 105 contains a VLIW program word, which can be, for example, 128 or 256 bits long.
- VLIW program word is 128 bits long, wherein the length of each one of the followings: opcode 221', CPU mapped "source 1" address 222', CPU mapped "source 2" address 223' and CPU mapped destination address 224' can be, for example, 32 bits long.
- the VLIW program word is 256 bits long.
- Each of the above addresses relates to a specific address within the entire CPU memory map, and is represented, for example, by a 2 32 or 2 64 binary number, respectively.
- the command for performing such an operation is provided into said execution unit 130 via control bus 234. Then, after accomplishing the operation, the corresponding result is written back into the destination register/memory means (for example, located within the corresponding peripheral device, such as a USB device) over data bus 233, whose address is defined by the CPU mapped destination address 224' of the VLIW program word.
- the destination register/memory means for example, located within the corresponding peripheral device, such as a USB device
- I/O control units e.g., memory management units
- CPU mapped memory means such as cache memories, off-CPU-chip memories and other memory means.
- executing unit 130 is enabled to operate substantially directly on each of said CPU mapped memory means provided within spread register file system 206, i.e. is enabled to execute instructions without the need for generating and performing LOAD commands (loading data into said spread register file system 206 from external memory means) and corresponding additional STORE commands for storing the result of executing unit 130 operation externally to said spread register file system 206.
- spread register file system 206 can operate with more than one executing unit 130. Further, instruction register 105 and/or executing unit 130 can be provided within said spread register file system 206.
- spread register file system 206 can be provided on-CPU-chip (incorporated within a CPU) or off-CPU-chip. Further, according to still another embodiment of the present invention, a portion of spread register file system 206 can be provided on-CPU-chip and another portion - off- CPU-chip.
- Fig. 2B is another schematic illustration of connecting spread register file system 206 to instruction register 105 and to execution unit 130 (such as ALU), according to another embodiment of the present invention.
- "source 1" address 221", “source 2" address 222" and "destination” address 223" are inputted from instruction register 105 into address generator 250 for generating corresponding addresses being related to the entire CPU mapped memory.
- address 221", 222" and 223" can be for example each 5 bits long, and CPU mapped "source 1" address, CPU mapped “source 2" address, CPU mapped “destination” address is each 32 bits long (if implemented for MIPS32 CPU), since each of these mapped addresses relates to the entire CPU memory map that is represented by 2 32 addresses. Similarly, for the MIPS64 CPU implementation, each of these mapped addresses is 64 bits long. It should be noted that address generator 250 can generate addresses in various ways based on different address generating functions.
- address generator 250 can receive in its input a 5 bits long address (represented by a 2 s binary number) from instruction register 105, and then it can add this address to another 2 32 number, thereby generating a new CPU mapped address that is 32 bits long.
- the above 2 32 number can be a predefined number, random number or a number that is calculated (generated) by means of address generator 250 according to some predefined function(s)/expressions.
- said new CPU mapped address can be generated according to opcode 221" that can be inputted from instruction register 105 into said address generator 250.
- each of the addresses outputted from instruction register 105 over lines 251, 252 and 253 can be further related to corresponding registers within address generator 250 (e.g., said each of the addresses outputted from said instruction register 105 can be related to a different base address of 32-bits (for MIPS32 technology), or 64-bits (for MIPS64 technology) provided within said address generator 250, based on which a CPU mapped address can be generated).Thus, each CPU mapped address (to be outputted from address generator 150) can be generated according to values stored within these corresponding registers.
- the "source 1" generated CPU mapped address (provided from address generator 250 over line 212) can be the sum of: a value of operand 222" and the corresponding "source 1" base address stored within said address generator 250.
- the "source 2" (or “destination”) generated CPU mapped address can be the sum of: a value of operand 223" (or 224") and the corresponding "source 2" (or “destination”) base address stored within said address generator 250.
- address generator 250 stores CPU mapped addresses (e.g., 32 or 64 bits long), and for each operand 222", 223" and 224", said address generator 250 outputs a corresponding CPU mapped address.
- Fig. 3 is a schematic illustration of spread register file system 206, according to an embodiment of the present invention.
- Spread register file system 206 receives as inputs: CPU mapped "source 1" address (MSl address) over bus (line) 212, CPU mapped "source 2" address (MS2 address) over bus 213 and CPU mapped "destination" address (MD address) over bus 214 (each 32 bits long, for example).
- MSl address CPU mapped "source 1" address
- MS2 address CPU mapped "source 2" address
- MD address CPU mapped "destination" address
- addresses are converted by means of address converters 320, 321 and 322, respectively to addresses of corresponding peripheral/memory means, such as peripheral/memory means 301, 302, 303, ..., 310 (e.g., cache memories, tightly coupled memories, secondary memories, SRAM, DRAM, disk-on-keys, hard disks, CDs (Compact Discs), DVDs, or any other peripheral/memory means).
- peripheral/memory means such as peripheral/memory means 301, 302, 303, ..., 310
- said address converters can convert the CPU mapped addresses in various ways based on different address converting functions/expressions.
- the above address can be converted according to opcode 221 ' or 221" (Figs. 2A or 2B) that can be inputted from instruction register 105 (Fig.
- control unit 350 which generates corresponding control signals to address converters 320, 321 and 322 and to executing unit 130: for example, if opcode 221' or 221" relates to moving "source 1" data to the "destination" register, then only address converters 320 and 322 can be activated.
- the converted "source 1 " and “source 2" addresses are inputted into corresponding peripheral/memory means (e.g., peripheral/memory means 301, 302, 303,..., N), which in turn outputs corresponding data stored in said addresses over "source 1" read bus 231 and "source 2" read bus 232.
- peripheral/memory means e.g., peripheral/memory means 301, 302, 303,..., N
- said data is processed (executed) by means of one or more execution units 130 (such as ALUs).
- the processing result is provided over write back bus 233 to one or more peripheral/memory means to be stored in corresponding converted destination addresses (CD addresses) within said one or more peripheral/memory means.
- the "source 1", “source 2", and “destination” memory cells can be physically located within the same or within different peripheral/memory means (such as peripheral/memory means 301, 302, 303,..., N).
- address converters 320, 321 and 322 further provide Write Enable (WE)/Chip Select (CS) signals (for example, binary "0" or "1") to each of said peripheral/memory means 301, 302, 303, ...,N for enabling reading or writing from or to said peripheral/memory means (data units) 301, 302, 303, ..., N.
- WE/CS commands can be provided to each of said peripheral/memory means (data units) 301, 302, 303, ..., N when accessing each converted address (e.g., "source 1" converted address) within said each peripheral/memory means 301, 302, 303, ..., N.
- CS read command
- WE write command
- address converters 320, 321 and 322 can be unified in a single address converter for converting CPU mapped "source 1", “source 2" and “destination” addresses into corresponding peripheral/memory means addresses.
- the address decoding (or the address conversion) is performed within one or more peripherals/memory means 301, 302, 303, ..., N.
- Peripherals/memory means 301, 302, 303, ..., N can receive CPU mapped addresses and decode (or convert) them accordingly for determining corresponding addresses within said peripherals/memory means 301, 302, 303, ..., N, in which the required data is stored (or to be stored).
- blocks 320, 321 and 322 can provide WE/CS commands to peripherals/memory means 301, 302, 303, ..., N, and do not perform the address conversion. Further, it should be noted that WE/CS commands can be generated by means of control unit 350.
- an address converter (such as address converter 320, 321 or 322) can be incorporated (integrated) within each (or one or more) peripheral/memory means 301, 302, 303, ..., or N.
- said peripheral/memory means receives a CPU mapped address and determines by means of the integrated address converter (according to predefined base-addresses of said peripheral/memory means), whether the received CPU mapped address is related to one or more memory cells provided within said peripheral/memory means or within another peripheral/memory means. It should be noted that the base-addresses of each peripherals/memory means can be further dynamically changed upon the need.
- Fig. 4 is a pipeline representation 400 of operating with spread register file system 206 (Fig. 2B), according to an embodiment of the present invention.
- pipeline has 12 stages (TO to TI l), each of which can correspond to a single CPU clock cycle.
- the address of the instruction (program word) to be fetched is conveyed from the Program Counter (PC) to the CPU program memory (e.g., RAM (not shown)).
- the instruction (program word) is fetched from said program memory into instruction register 105 (Fig. 2A).
- the fetched instruction decoding is performed by means of control unit 350 (Fig. 3), which provides control signals during pipeline stages.
- address generator 250 Fig.
- the next program counter address can be determined, based on the decoded instruction. Also, for example, when a conventional JUMP or BRANCH command is issued, then the next program counter address is calculated in accordance with a pointer of said JUMP/BRANCH command).
- the CPU mapped addresses are converted by means of address converters 320, 321 and 322 (Fig. 3), respectively to addresses of corresponding peripheral/memory means, such as peripheral/memory means 301, 302 and 303 (Fig. 3) (e.g., cache memories, secondary memories, disk-on-keys, hard disks, or any other peripheral/memory means).
- peripheral/memory means such as peripheral/memory means 301, 302 and 303 (Fig. 3) (e.g., cache memories, secondary memories, disk-on-keys, hard disks, or any other peripheral/memory means).
- said address converters can convert the CPU mapped addresses in various ways based on different address converting functions/expressions.
- the above address can be converted according to opcode 221' or 221" (Figs. 2 A or 2B) that can be inputted from instruction register 105 (Fig. 2A) into a control unit 350 (Fig.
- said peripherals/memory means 301, 302, 303, ..., N generate a READ request to their internal memory, thereby enabling reading corresponding data stored within them at the received converted addresses.
- said corresponding data is read and ready, and then at stage T7, said data is latched and conveyed to the "source 1" and "source 2" read buses 231 and 232, respectively.
- the data is provided over said read buses 231 and 232 into execution unit 130 (e.g., ALU).
- execution unit 130 e.g., ALU
- T9 and TlO the data is processed by means of said execution unit 130. It should be noted that the data can be processed only in stage T9, and no further processing can be required.
- the pipeline can have, for example, only 11 stages (TO to TlO).
- the processing result is written back into the "destination" register that is provided, for example, within peripherals/memory means 301, 302, 303, ..., N. It should be noted that the writing back operation can take more than a single CPU clock cycle.
- CPU control unit 350 controls the pipeline process by generating required control signals during the pipeline stages.
- CPU stalls are reduced or substantially eliminated. For example, there can be substantially no CPU stalls if the access latency to SRF system 206 is 6 (or less) CPU clock cycles and the pipeline is relatively deep (e.g., 12 stages).
- a number of instructions and CPU clock cycles required for manipulating/processing is significantly reduced, compared to the prior art.
- the number of instructions and corresponding CPU clock cycles for processing the data can be reduced, for example, to a single instruction that takes a single CPU clock cycle, enabling providing a substantially direct access between execution unit 130 and peripherals/memory means 301, 302, 303,...N.
- spread register file system 206 can be shared between two or more processing unit (e.g, CPU, microprocessor, and the like), and/or between other internal/external (on-chip/off-chip) peripherals or devices.
- processing unit e.g, CPU, microprocessor, and the like
- the structure of a conventional CPU program word, compared to the prior art, is not changed.
- the need in using conventional DMA engines is eliminated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US7158408P | 2008-05-07 | 2008-05-07 | |
US61/071,584 | 2008-05-07 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2009136402A2 true WO2009136402A2 (en) | 2009-11-12 |
WO2009136402A3 WO2009136402A3 (en) | 2010-03-11 |
Family
ID=41265110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2009/000472 WO2009136402A2 (en) | 2008-05-07 | 2009-05-07 | Register file system and method thereof for enabling a substantially direct memory access |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2009136402A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150089111A1 (en) * | 2011-12-22 | 2015-03-26 | Intel Corporation | Accessing data stored in a command/address register device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6081783A (en) * | 1997-11-14 | 2000-06-27 | Cirrus Logic, Inc. | Dual processor digital audio decoder with shared memory data transfer and task partitioning for decompressing compressed audio data, and systems and methods using the same |
US6269436B1 (en) * | 1995-12-11 | 2001-07-31 | Advanced Micro Devices, Inc. | Superscalar microprocessor configured to predict return addresses from a return stack storage |
US20030200339A1 (en) * | 2001-07-02 | 2003-10-23 | Globespanvirata Incorporated | Communications system using rings architecture |
US20070006150A9 (en) * | 2002-12-02 | 2007-01-04 | Walmsley Simon R | Multi-level boot hierarchy for software development on an integrated circuit |
-
2009
- 2009-05-07 WO PCT/IL2009/000472 patent/WO2009136402A2/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6269436B1 (en) * | 1995-12-11 | 2001-07-31 | Advanced Micro Devices, Inc. | Superscalar microprocessor configured to predict return addresses from a return stack storage |
US6081783A (en) * | 1997-11-14 | 2000-06-27 | Cirrus Logic, Inc. | Dual processor digital audio decoder with shared memory data transfer and task partitioning for decompressing compressed audio data, and systems and methods using the same |
US20030200339A1 (en) * | 2001-07-02 | 2003-10-23 | Globespanvirata Incorporated | Communications system using rings architecture |
US20070006150A9 (en) * | 2002-12-02 | 2007-01-04 | Walmsley Simon R | Multi-level boot hierarchy for software development on an integrated circuit |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150089111A1 (en) * | 2011-12-22 | 2015-03-26 | Intel Corporation | Accessing data stored in a command/address register device |
US9436632B2 (en) * | 2011-12-22 | 2016-09-06 | Intel Corporation | Accessing data stored in a command/address register device |
US9442871B2 (en) | 2011-12-22 | 2016-09-13 | Intel Corporation | Accessing data stored in a command/address register device |
Also Published As
Publication number | Publication date |
---|---|
WO2009136402A3 (en) | 2010-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110990060B (en) | Embedded processor, instruction set and data processing method of storage and computation integrated chip | |
KR101121606B1 (en) | Thread optimized multiprocessor architecture | |
US7473293B2 (en) | Processor for executing instructions containing either single operation or packed plurality of operations dependent upon instruction status indicator | |
JP6124463B2 (en) | Inter-architecture compatibility module that allows code modules of one architecture to use library modules of the other architecture | |
RU2636675C2 (en) | Commands, processors, methods and systems of multiple registers access to memory | |
JP2776132B2 (en) | Data processing system with static and dynamic masking of information in operands | |
US20040215852A1 (en) | Active memory data compression system and method | |
US10678541B2 (en) | Processors having fully-connected interconnects shared by vector conflict instructions and permute instructions | |
RU2638641C2 (en) | Partial width loading depending on regime, in processors with registers with large number of discharges, methods and systems | |
US5455955A (en) | Data processing system with device for arranging instructions | |
KR100462951B1 (en) | Eight-bit microcontroller having a risc architecture | |
JPH05502125A (en) | Microprocessor with last-in, first-out stack, microprocessor system, and method of operating a last-in, first-out stack | |
TW201717022A (en) | Backward compatibility by restriction of hardware resources | |
JPH0210452B2 (en) | ||
TW200403583A (en) | Controlling compatibility levels of binary translations between instruction set architectures | |
RU2639695C2 (en) | Processors, methods and systems for gaining access to register set either as to number of small registers, or as to integrated big register | |
KR100465388B1 (en) | Eight-bit microcontroller having a risc architecture | |
WO2019172987A1 (en) | Geometric 64-bit capability pointer | |
US9639362B2 (en) | Integrated circuit device and methods of performing bit manipulation therefor | |
US6012138A (en) | Dynamically variable length CPU pipeline for efficiently executing two instruction sets | |
US6327648B1 (en) | Multiprocessor system for digital signal processing | |
CN113900710A (en) | Expansion memory assembly | |
US20030196072A1 (en) | Digital signal processor architecture for high computation speed | |
CN114945984A (en) | Extended memory communication | |
KR100267092B1 (en) | Single instruction multiple data processing of multimedia signal processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09742573 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
WPC | Withdrawal of priority claims after completion of the technical preparations for international publication |
Ref document number: 61/071,584 Country of ref document: US Date of ref document: 20101025 Free format text: WITHDRAWN AFTER TECHNICAL PREPARATION FINISHED |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24/01/2012) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09742573 Country of ref document: EP Kind code of ref document: A2 |