US9507600B2 - Processor loop buffer - Google Patents
Processor loop buffer Download PDFInfo
- Publication number
- US9507600B2 US9507600B2 US14/164,633 US201414164633A US9507600B2 US 9507600 B2 US9507600 B2 US 9507600B2 US 201414164633 A US201414164633 A US 201414164633A US 9507600 B2 US9507600 B2 US 9507600B2
- Authority
- US
- United States
- Prior art keywords
- loop
- instruction
- buffer
- instructions
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/3013—Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
Definitions
- Program loops are sequences of instructions that are repeatedly executed. Program loops are frequently employed in various types of software. In a typical program loop, a conditional jump or branch instruction at the end of the loop conditionally redirects execution to the start of the loop. When a loop is executed more than once by a processor, the instructions of the loop may be read multiple times from instruction memory. Repeated access of memory to fetch loop instructions increases energy consumption. Additionally, execution of the loop jump instruction may cause the processor pipeline to stall while awaiting jump decision status or fetching the jump destination instruction from the instruction memory. Stall cycles reduce processor performance. Thus, while incorporation of program loops effectively reduces program size, loop execution can detrimentally affect processor performance.
- a processor includes an execution unit and an instruction fetch buffer.
- the execution unit is configured to execute instructions.
- the instruction fetch buffer is configured to store instructions for execution by the execution unit.
- the instruction fetch buffer is operable to provide a loop buffer configured to store instructions of an instruction loop for repeated execution by the execution unit.
- the loop buffer includes buffer control logic and pointers.
- the buffer control logic is configured to predecode a loop jump instruction and to identify loop start and loop end instructions using the pre-decoded loop jump instruction and the pointers; and to control non-sequential instruction execution of the instruction loop.
- the width of the pointers is determined by loop buffer length and is less than a width of an address bus for fetching the instructions stored in the loop buffer from an instruction memory.
- an instruction fetch buffer includes a loop buffer configured to store instructions of an instruction loop for execution by an execution unit.
- the loop buffer includes buffer control logic comprising pointers, and is configured to identify loop start and loop end instructions using the pointers, to identify which instructions, stored in the instruction buffer, are executed as part of the instruction loop, and to control non-sequential instruction execution of an instruction loop.
- the width of the pointers is determined by loop buffer length and is less than a width of an address bus for fetching the instructions stored in the loop buffer from an instruction memory.
- a method includes partitioning an instruction fetch buffer into a loop buffer and a pre-fetch buffer. Instructions of an instruction loop read from an instruction memory are stored in the loop buffer. A location of a loop branch instruction stored in the loop buffer and a location of a branch destination of the loop branch instruction in the loop buffer are identified by loop control logic of the loop buffer. A first pointer to the location of the loop branch instruction and a second pointer to a location of the branch destination of the loop branch instruction are set by the loop control logic. An instruction pointed to by the second pointer is provided by the loop buffer immediately subsequent to providing the loop branch instruction pointed to by the first pointer without introduction of stall cycles into execution. A forward branch in the instruction loop stored in the loop buffer is identified. An instruction at a branch destination location for the forward branch instruction is provided, by and from the loop buffer, immediately subsequent to providing the forward branch without introduction of stall cycles into execution.
- FIG. 1 shows a block diagram of a processor in accordance with various embodiments
- FIG. 2 shows a block diagram of a fetch buffer in accordance with various embodiments
- FIG. 3 shows an exemplary state diagram for buffer control logic in accordance with various embodiments
- FIGS. 4-8 show instruction loops stored in a loop buffer in accordance with various embodiments.
- FIG. 9 shows a flow diagram for a method for loop buffering and execution in accordance with various embodiments.
- loop and “jump” are used herein as equivalents to refer to a discontinuity in instruction retrieval and execution. Accordingly, the terms “loop jump” and “loop branch” are used as equivalents.
- Loop acceleration caches store instruction data and address information, and include address comparison logic for comparing fetch addresses with the stored address.
- address comparison logic for comparing fetch addresses with the stored address.
- the number of stored addresses and address comparators differs.
- conventional architectures implement separate associative cache ways that require additional address storage and comparators.
- Caches that store a large number of addresses and include a large number of address comparators can provide a high cache hit rate at the expense of high gate count and high energy consumption.
- Some caches may store a small number of addresses and include few address comparators thereby producing a lower cache hit rate with lower gate count and lower energy consumption.
- caches, including conventional specialized loop caches are a compromise between increasing cache hit rate and reducing the number of included address comparators.
- the number of address comparators determines the number of nesting levels and branches supported in a cached loop.
- Embodiments of the present disclosure include a loop buffer formed in the instruction fetch buffer of a processor.
- the loop buffer includes pointer logic that controls fetching of loop instructions from the loop buffer.
- FIG. 1 shows a block diagram of a processor 100 in accordance with various embodiments.
- the processor 100 may be a general purpose microprocessor, a digital signal processor, a microcontroller, or other computing device that executes instructions retrieved from a memory device.
- the processor 100 includes a fetch unit 102 , a decode unit 106 , and an execution unit 108 .
- the fetch unit 102 retrieves instructions from a storage device, such as a memory, for execution by the processor 100 .
- the fetch unit 102 provides the retrieved instructions to the decode unit 106 .
- the decode unit 106 examines the instructions received from the fetch unit 102 , and translates each instruction into controls suitable for operating the execution unit 108 , processor registers, and other components of the processor 100 to perform operations that effectuate the instructions. In some embodiments of the processor 100 , various operations associated with instruction decoding may be performed in the fetch unit 102 or another operational unit of the processor 100 .
- the decode unit 106 provides control signals to the execution unit 108 that cause the execution unit 108 to carry out the operations needed to execute each instruction.
- the execution unit 108 includes arithmetic circuitry, shifters, multipliers, registers, logical operation circuitry, etc. that are arranged to manipulate data values as specified by the control signals generated by the decode unit 106 .
- Some embodiments of the processor 100 may include multiple execution units that include the same or different data manipulation capabilities.
- the processor 100 may include various other components that have omitted from FIG. 1 as a matter of clarity.
- embodiments of the processor 100 may include instruction and/or data caches, memory, communication devices, interrupt controllers, timers, clock circuitry, direct memory access controllers, and various other components and peripherals.
- the fetch unit 102 includes a fetch buffer 104 .
- the fetch buffer 104 provides storage for instructions pre-fetched from instruction storage, e.g., fetched from a memory device external to the processor 100 that stores instructions. By pre-fetching instructions, the processor 100 can provide stored instructions for execution without the delays often associated with fetching instructions from a memory device that may be unable to provide instructions at as high a rate as the processor 100 is able to execute the instructions.
- the fetch buffer 104 or a portion thereof, is arranged to operate as a loop buffer that recognizes and stores instructions of an instruction loop fetched from memory.
- FIG. 2 shows a block diagram of the fetch buffer 104 in accordance with various embodiments.
- the fetch buffer 104 includes instruction storage 202 , pointers 204 , pointer arithmetic logic 206 , and fetch/loop control logic 208 .
- the instruction storage 202 includes an array of storage cells, such as registers and/or memory devices that store instructions retrieved from an instruction storage device, such as a memory external to the processor 100 . Instructions stored in the instruction storage 202 are provided to the decoder 106 for execution by the execution unit 108 .
- the instruction storage 202 may include storage for any number of instructions. For example, embodiments of the instruction storage 202 may store 16, 32, 64, 128, or another number of instruction words.
- the storage cells of the instruction storage 202 may be of any width needed to store instructions executed by the processor 100 .
- the storage cells may be 16 bits in width if the processor 100 executes 16-bit instructions, 32 bits in width if the processor 100 executions 32-bit instructions, etc.
- the fetched instructions may be sequentially stored in the instruction storage 202 .
- the pointers 204 are registers that provide values for addressing the instruction storage 202 , and include a read pointer for reading instructions from the instruction storage 202 , a write pointer for writing instructions to the instruction storage 202 , and pointers that identify loop start and loop end locations in the instruction storage 202 .
- the pointers 204 may include any number of pointer registers. For example, four pointer registers comprising a read pointer, a write pointer, and two loop pointers may be included in the pointers 204 . The width of each pointer may be determined based on the number of the storage cells included in the instruction storage 202 .
- each pointer may be 4 bits wide; if the instruction storage 202 includes 32 storage cells, then each pointer may be 5 bits wide, etc.
- the pointers accessing the instruction storage 202 may be substantially smaller than the addresses used to access the memory from which the instructions are fetched into the instruction storage 202 .
- the address values used to access an external memory, and address values included in the instructions fetched may be 16 or more bits wide, while the pointers 204 may be 4 bits wide.
- the pointer arithmetic logic 206 is coupled to the pointers 204 , and includes circuitry for arithmetically manipulating the values stored in the pointers.
- the pointer arithmetic logic 206 may include adders, shifters, etc. for changing the value stored in a given one of the pointers 204 .
- the pointer arithmetic 206 may add an offset value to a pointer 204 to set the pointer to a branch destination in the instruction storage 202 .
- the fetch/loop control logic 208 controls the operation of the fetch buffer 104 .
- At least a portion of the instruction storage 202 may be used to store instructions of an instruction loop and repetitively provide the instructions to the decode unit 106 for execution by the execution unit 108 .
- the fetch/loop control logic 208 may allocate a portion of the instruction storage 202 for use as a loop buffer and a portion of the instruction storage 202 for use as a pre-fetch buffer. Instructions stored in the pre-fetch buffer may be replaced by a newly fetched instruction after being provided to the decode unit 106 a single time. In contrast, instructions stored in the loop buffer may be replaced only after execution of an instruction loop including the instructions is complete.
- the fetch/loop control logic 208 includes logic that recognizes instructions included in an instruction loop and manages the forwarding of loop instructions to the decode unit 106 .
- the fetch/loop control logic 208 predecodes and examines the instructions written to and/or read from the instruction storage 202 to identify loop jump instructions and other instructions associated with instruction loops (e.g., loop identification instructions, forward branch instructions call/return instructions, etc.).
- a loop identifier that identifies the start of the instruction loop may be included before or at the beginning of the instruction loop.
- the loop identifier may be a dedicated instruction or a field of an instruction.
- the fetch/loop control logic 208 may identify all instructions encountered as part of the instruction loop until a branch or jump instruction is encountered that redirects execution to the start of the loop, i.e., to the loop identifier or one sequential instruction after the loop identifier.
- the fetch/loop control logic 208 uses the pointers 204 and the pointer arithmetic logic 206 to manage instruction loops.
- the fetch/loop control logic 208 may initialize a pointer (loop start pointer) to the location of the loop identifier at the start of the instruction loop, and initialize a pointer (loop end pointer) to the location of the branch/jump instruction at the end of the instruction loop.
- a pointer loop start pointer
- a pointer loop end pointer
- the fetch/loop control logic 208 may provide this instruction flow change by comparing the current read pointer for reading instruction words from the instruction storage 202 to the previously set loop end pointer. If equal, the read pointer will be updated to the loop start pointer for the next cycle, thereby eliminating stall cycles from the non-sequential execution of instructions.
- the fetch/loop control logic 208 identifies the branch/jump instruction of an instruction loop, and thereafter identifies the start of the instruction loop as the destination of the branch/jump.
- the loop branch instruction may include a field that identifies the branch as a loop branch.
- the fetch/loop control logic 208 may initialize a pointer (loop end pointer) to the location of the branch/jump instruction at the end of the instruction loop, and initialize a pointer (loop start pointer) to the location of the destination of the branch/jump instruction at the start of the instruction loop. Instructions executed between the loop start pointer and the loop end pointer may be tagged as instructions of the instruction loop.
- the fetch buffer 104 may support any number of nested loops using only the buffer read pointer, buffer write pointer, loop start pointer, and loop end pointer disclosed herein.
- FIG. 3 shows an exemplary state diagram for loop control operations of the fetch/loop control logic 208 .
- the logic 208 enters the NORMAL_FETCH state and while no loop start identifier is detected or a loop jump is taken 302 , instructions are fetched from memory, sequentially stored in the instruction storage 202 , and tagged for replacement after execution.
- fetch/loop control logic 208 transitions to the LOOP_RECORD state.
- Pointers are set to identify the locations of the loop jump instruction and the jump destination instruction in the instruction storage 202 .
- Write and read pointers may be set to record and read the loop from the start of the loop (i.e., the loop jump destination). Instructions of the loop are loaded into the instruction storage 202 .
- the instruction loop is aborted (i.e., a discontinuity such as an interrupt is encountered or a branch to a destination outside of the loop is taken)
- the LOOP_RECORD state is exited 308 , and the NORMAL_FETCH state is re-entered.
- the instruction storage 202 may be cleared and reloaded with instructions fetched from memory at the new execution address.
- the LOOP_BUF_READ state is entered.
- the loop instructions are sequentially provided to the decode unit 106 until a jump/branch instruction is encountered, at which time the instruction at the destination pointer may be provided (e.g., in accordance with branch conditions).
- the fetch/loop control logic 208 can accelerate nested instruction loops, forward branches (conditionals, such as if-then-else constructs) located within an instruction loop, and other instruction flow discontinuities occurring in an instruction loop.
- the LOOP_BUF_READ state is exited, and the NORMAL_FETCH state is re-entered.
- the instruction storage 202 may be cleared and reloaded with post-loop instructions fetched from memory.
- state diagram 300 illustrates one example of loop buffer control
- some embodiments may implement other control methods. For example, if a loop identification flag is provided before or at the start of the instruction loop, then control logic 208 may begin recording loop instructions when the start of the loop is stored in the instruction storage 202 .
- FIGS. 4-8 show instruction loops stored in a loop buffer portion of the fetch buffer 104 in accordance with various embodiments.
- FIG. 4 shows an instruction loop identified by LABEL_1. The instructions of the loop have been fetched and stored in the instruction storage 202 , and loop start and end pointers of the pointers 204 have been set to the locations of the instruction at LABEL_1 and the corresponding loop jump/branch instruction. As the loop is executed, the read pointer of the pointers 204 sequentially advances until equal to the loop end pointer (i.e., the read pointer points to the loop jump instruction), at which time the read pointer may be loaded with the value of the loop start pointer to fetch the instruction at LABEL_1.
- the loop end pointer i.e., the read pointer points to the loop jump instruction
- FIG. 5 shows a first instruction loop identified by LABEL_1 and a second instruction loop identified by LABEL_2 nested within the first instruction loop.
- the instructions of both loops have been fetched and stored in the instruction storage 202 , and pointers 204 have been set to the locations of the instruction at LABEL_1 and the corresponding loop jump/branch instruction.
- the fetch/loop control logic 208 can apply the pointer to the jump destination instruction (LABEL_1) to provide the instruction without stall cycles.
- the fetch/loop control logic 208 applies the pointer arithmetic logic 206 to update the read pointer according to the offset value provided in the loop jump instruction.
- the updated read pointer will point to the instruction at LABEL_2.
- FIG. 6 shows an instruction loop identified by LABEL_1 stored in the instruction storage 202 .
- the instruction loop includes a forward branch/jump associated with LABEL_2.
- the forward branch may result from a conditional construct, such as if-then.
- Pointers 204 have been set to the locations of the instructions at LABEL_1 and the corresponding loop jump.
- the fetch/loop control logic 208 handles the forward branch/jump within the loop by updating the read pointer in accordance with the offset value provided by the forward branch/jump instruction. When either jump/branch instruction is executed, the fetch/loop control logic 208 can apply the read pointer to the corresponding jump destination instruction to provide the instruction without stall cycles.
- FIG. 7 shows an instruction loop identified by LABEL_1.
- the instruction flow of the loop, as stored in memory is shown, and the instructions as stored in the instruction storage 202 are shown.
- the instruction loop of FIG. 7 includes more instructions than can be concurrently stored in the instruction storage 202 .
- the fetch buffer 104 can accelerate execution of such instruction loops.
- the fetch/loop control logic 208 partitions the instruction storage 202 into a loop buffer portion and a pre-fetch buffer portion. Instructions located at the start of the loop are stored in the loop buffer portion, and instructions located at the end of the loop are fetched and executed from the pre-fetch buffer portion.
- the fetch/loop control logic 208 applies a pointer 204 to retrieve the destination instruction at LABEL_1 in the loop buffer portion, and initiates fetching of loop instructions immediately following the last instruction stored in the loop buffer portion into the pre-fetch buffer portion.
- additional loop instructions are fetched into the pre-fetch buffer portion.
- the pre-fetch buffer portion operates as ring buffer where each instruction may be replaced after execution, while the start of the loop is retained in the loop buffer portion until looping is terminated.
- the fetch buffer 104 may provide loop acceleration (e.g., no stalls) for loops that are too large to be wholly stored in the instruction storage 202 .
- FIG. 8 shows an instruction loop that includes a subroutine call.
- the fetch buffer 104 can accelerate the instruction loop and handle the subroutine call without introducing stall cycles.
- the fetch/loop control logic 208 identifies, and sets pointers to, the call instruction and the destination of the call instruction.
- the instructions of the subroutine may be loaded into the instruction storage 202 .
- the fetch/loop control logic 208 identifies, and sets pointers 204 to, the sub-routine return instruction and the return instruction destination.
- the fetch/loop control logic 208 applies the pointers 204 to redirect program flow without introduction of stall cycles.
- the call instruction and/or the return instruction may not be provided from the loop buffer for execution.
- the first instruction of the sub-routine may be provided for execution.
- the call instruction and/or the return instruction may be eliminated/overwritten in the instruction storage 202 so that during execution of the loop the cycles for call and return instruction executing are saved and the number of instructions stored in the instruction storage 202 is reduced.
- the instruction storage 202 may be partitioned into a loop buffer portion and pre-fetch buffer portion as explained with regard to FIG. 7 , and the subroutine fetched into the pre-fetch buffer portion for execution.
- the entirety of the instruction loop and sub-routine called may be stored in the instruction storage 202 .
- FIG. 9 shows a flow diagram for a method 900 for loop buffering and execution in accordance with various embodiments. Though depicted sequentially as a matter of convenience, at least some of the actions shown can be performed in a different order and/or performed in parallel. Additionally, some embodiments may perform only some of the actions shown.
- the instruction storage 202 is partitioned into a loop buffer and a pre-fetch buffer.
- the fetch/loop control logic 208 maintains instructions stored in the loop buffer portion until execution of an instruction loop is complete.
- the fetch/loop control logic 208 may replace instructions stored in the pre-fetch buffer portion after execution.
- the fetch/loop control logic 208 identifies an instruction loop (e.g., by identifying a loop start flag or loop branch instruction), and identifies the location in the instruction storage 202 of the loop branch instruction and the destination of the loop branch instruction (i.e., the loop start instruction).
- the fetch/loop control logic 208 sets one of the pointers 204 to the location the loop branch instruction in the instruction storage 202 , and sets one of the pointers 204 to the location of the loop start instruction in the instruction storage 202 .
- the instruction loop is being executed from the fetch buffer 104 , and the fetch/loop control logic 208 provides the instruction at the loop branch destination address without introducing stall cycles in the processor pipeline.
- stall cycles may be introduced, but the number of stall cycles is reduced relative to fetching from external instruction memory.
- the fetch/loop control logic 208 identifies a forward branching instruction in the instruction loop.
- the forward branch may be provided as part of an if-then-else type conditional construct.
- the forward branch instruction is provided from the instruction storage for execution, the buffer read pointer is updated in accordance with the offset value provided in the forward branch instruction, and the instruction at the destination of the forward branch is provided for execution without introduction of stall cycles.
- the fetch/loop control logic 208 identifies a subroutine call instruction in the instruction loop.
- the fetch/loop control logic 208 sets a pointer 204 to the location the call instruction in the instruction storage 202 , sets a pointer 204 to the location of the destination in the call instruction in the instruction storage 202 (i.e., the start of the sub-routine being called), sets a pointer 204 to the location of the sub-routine return instruction in the instruction storage 202 , and sets a pointer 204 to the location of the instruction following the call instruction (i.e., the instruction executed on return from the sub-routine).
- the fetch/loop control logic 208 provides the call instruction from the instruction storage 202 for execution, and subsequently provides the first instruction of the sub-routine (as indicated by the aforementioned pointer 204 ) for execution without introducing stall cycles. Similarly, fetch/loop control logic 208 provides the return instruction from the instruction storage 202 for execution, and subsequently provides the instruction following the call instruction (as indicated by the aforementioned pointer 204 ) for execution without introducing stall cycles. In some embodiments, rather than providing the call and/or return instruction for execution, the instruction at the buffer location corresponding to the target of the call or return instruction is provided for execution.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
Claims (21)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/164,633 US9507600B2 (en) | 2014-01-27 | 2014-01-27 | Processor loop buffer |
| PCT/US2015/013149 WO2015113070A1 (en) | 2014-01-27 | 2015-01-27 | Processor loop buffer |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/164,633 US9507600B2 (en) | 2014-01-27 | 2014-01-27 | Processor loop buffer |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20150212820A1 US20150212820A1 (en) | 2015-07-30 |
| US9507600B2 true US9507600B2 (en) | 2016-11-29 |
Family
ID=53679121
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/164,633 Active 2035-01-15 US9507600B2 (en) | 2014-01-27 | 2014-01-27 | Processor loop buffer |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US9507600B2 (en) |
| WO (1) | WO2015113070A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11663008B2 (en) | 2019-03-11 | 2023-05-30 | Samsung Electronics Co., Ltd. | Managing memory device with processor-in-memory circuit to perform memory or processing operation |
| US20250138826A1 (en) * | 2023-10-27 | 2025-05-01 | Beijing Eswin Computing Technology Co., Ltd. | Processor, Instruction Fetching Method, and Computer System |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10802828B1 (en) | 2018-09-27 | 2020-10-13 | Amazon Technologies, Inc. | Instruction memory |
| US11650821B1 (en) * | 2021-05-19 | 2023-05-16 | Xilinx, Inc. | Branch stall elimination in pipelined microprocessors |
| US20240394064A1 (en) * | 2022-07-13 | 2024-11-28 | Condor Computing Corporation | Apparatus and Method for Implementing a Loop Prediction of Multiple Basic Blocks |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5809273A (en) * | 1996-01-26 | 1998-09-15 | Advanced Micro Devices, Inc. | Instruction predecode and multiple instruction decode |
| US5953512A (en) | 1996-12-31 | 1999-09-14 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods implementing a loop and/or stride predicting load target buffer |
| US6189092B1 (en) * | 1997-06-30 | 2001-02-13 | Matsushita Electric Industrial Co., Ltd. | Pipeline processor capable of reducing branch hazards with small-scale circuit |
| US20020178350A1 (en) | 2001-05-24 | 2002-11-28 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device |
| US6598155B1 (en) | 2000-01-31 | 2003-07-22 | Intel Corporation | Method and apparatus for loop buffering digital signal processing instructions |
| US20070113058A1 (en) | 2005-11-14 | 2007-05-17 | Texas Instruments Incorporated | Microprocessor with indepedent SIMD loop buffer |
| US20070239975A1 (en) * | 2006-04-07 | 2007-10-11 | Lei Wang | Programmable backward jump instruction prediction mechanism |
| US9274951B2 (en) * | 2013-05-31 | 2016-03-01 | Altera Corporation | Cache memory controller for accelerated data transfer |
-
2014
- 2014-01-27 US US14/164,633 patent/US9507600B2/en active Active
-
2015
- 2015-01-27 WO PCT/US2015/013149 patent/WO2015113070A1/en not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5809273A (en) * | 1996-01-26 | 1998-09-15 | Advanced Micro Devices, Inc. | Instruction predecode and multiple instruction decode |
| US5953512A (en) | 1996-12-31 | 1999-09-14 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods implementing a loop and/or stride predicting load target buffer |
| US6189092B1 (en) * | 1997-06-30 | 2001-02-13 | Matsushita Electric Industrial Co., Ltd. | Pipeline processor capable of reducing branch hazards with small-scale circuit |
| US6598155B1 (en) | 2000-01-31 | 2003-07-22 | Intel Corporation | Method and apparatus for loop buffering digital signal processing instructions |
| US20020178350A1 (en) | 2001-05-24 | 2002-11-28 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device |
| US20070113058A1 (en) | 2005-11-14 | 2007-05-17 | Texas Instruments Incorporated | Microprocessor with indepedent SIMD loop buffer |
| US20070239975A1 (en) * | 2006-04-07 | 2007-10-11 | Lei Wang | Programmable backward jump instruction prediction mechanism |
| US9274951B2 (en) * | 2013-05-31 | 2016-03-01 | Altera Corporation | Cache memory controller for accelerated data transfer |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11663008B2 (en) | 2019-03-11 | 2023-05-30 | Samsung Electronics Co., Ltd. | Managing memory device with processor-in-memory circuit to perform memory or processing operation |
| US12106107B2 (en) | 2019-03-11 | 2024-10-01 | Samsung Electronics Co., Ltd. | Memory device for processing operation, data processing system including the same, and method of operating the memory device |
| US20250138826A1 (en) * | 2023-10-27 | 2025-05-01 | Beijing Eswin Computing Technology Co., Ltd. | Processor, Instruction Fetching Method, and Computer System |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2015113070A1 (en) | 2015-07-30 |
| US20150212820A1 (en) | 2015-07-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6898699B2 (en) | Return address stack including speculative return address buffer with back pointers | |
| US10248570B2 (en) | Methods, systems and apparatus for predicting the way of a set associative cache | |
| US12455745B2 (en) | Processor subroutine cache | |
| US9465615B2 (en) | Method and apparatus for branch prediction | |
| US8943300B2 (en) | Method and apparatus for generating return address predictions for implicit and explicit subroutine calls using predecode information | |
| EP0448499B1 (en) | Instruction prefetch method and system for branch-with-execute instructions | |
| US5774710A (en) | Cache line branch prediction scheme that shares among sets of a set associative cache | |
| US11163577B2 (en) | Selectively supporting static branch prediction settings only in association with processor-designated types of instructions | |
| US11861367B2 (en) | Processor with variable pre-fetch threshold | |
| US9507600B2 (en) | Processor loop buffer | |
| CN101763249A (en) | Reducing branch checking for non-control flow instructions | |
| WO2008067277A2 (en) | Methods and apparatus for recognizing a subroutine call | |
| US6684319B1 (en) | System for efficient operation of a very long instruction word digital signal processor | |
| US5146570A (en) | System executing branch-with-execute instruction resulting in next successive instruction being execute while specified target instruction is prefetched for following execution | |
| US7155574B2 (en) | Look ahead LRU array update scheme to minimize clobber in sequentially accessed memory | |
| US8266414B2 (en) | Method for executing an instruction loop and a device having instruction loop execution capabilities | |
| US7519799B2 (en) | Apparatus having a micro-instruction queue, a micro-instruction pointer programmable logic array and a micro-operation read only memory and method for use thereof | |
| US6289428B1 (en) | Superscaler processor and method for efficiently recovering from misaligned data addresses | |
| JP2004519028A (en) | Computer instructions with instruction fetch control bits | |
| WO2012132214A1 (en) | Processor and instruction processing method thereof | |
| JPH10283185A (en) | Processor | |
| JPH05257686A (en) | Instruction cache circuit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: TEXAS INSTRUMENTS DEUTSCHLAND GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WIENCKE, CHRISTIAN;LEDWA, RALPH;REICHEL, NORBERT;REEL/FRAME:034723/0194 Effective date: 20140124 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
| AS | Assignment |
Owner name: TEXAS INSTRUMENTS INCORPORATED, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TEXAS INSTRUMENTS DEUTSCHLAND GMBH;REEL/FRAME:055314/0255 Effective date: 20210215 Owner name: TEXAS INSTRUMENTS INCORPORATED, TEXAS Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:TEXAS INSTRUMENTS DEUTSCHLAND GMBH;REEL/FRAME:055314/0255 Effective date: 20210215 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |