US20080172529A1 - Novel context instruction cache architecture for a digital signal processor - Google Patents
Novel context instruction cache architecture for a digital signal processor Download PDFInfo
- Publication number
- US20080172529A1 US20080172529A1 US11/623,760 US62376007A US2008172529A1 US 20080172529 A1 US20080172529 A1 US 20080172529A1 US 62376007 A US62376007 A US 62376007A US 2008172529 A1 US2008172529 A1 US 2008172529A1
- Authority
- US
- United States
- Prior art keywords
- cache
- instruction
- instructions
- memory
- frequently executed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 claims description 155
- 238000000034 method Methods 0.000 claims description 45
- 230000000593 degrading effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 235000019800 disodium phosphate Nutrition 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to digital signal processors, and more particularly to real-time memory management for digital signal processors.
- a digital signal computer or digital signal processor is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, fast Fourier transforms, digital filters, image processing and speech recognition.
- DSP applications are characterized by real-time operation, high interrupt rates, and intensive numeric computations.
- DSP applications tend to be intensive in memory access operations and to require the input and output of large quantities of data.
- designs of DSPs may be quite different from those of general purpose processors.
- a Harvard architecture utilizes separate, independent program and data memories so that two memories may be accessed simultaneously. This permits instructions and data to be accessed in a single clock cycle. Frequently, the program occupies less memory space than data.
- a modified Harvard architecture utilizes the program memory for storing both instructions and data.
- the program and data memories are interconnected to the core processor by separate program and data buses.
- a method for reducing cache thrashing in a DSP comprising the steps of dynamically enabling caching of instructions upon encountering current frequently executed instructions in a program, and dynamically disabling the caching of the instructions upon encountering an exit point associated with the frequently executed instructions.
- a method for self configuring a cache memory in a digital signal processor comprising determining during run-time execution of a program whether a current instruction is coming from an external main memory or internal memory, outputting an execution-space control signal based on the determination that code is executed from internal memory, determining whether a fetch phase of the current instruction coincides with the memory access phase of a preceding load or store instruction on program memory bus, if so, outputting a conflict instruction load enable signal so that the cache memory behaves like a conflict cache and store the current instruction in the cache memory upon receiving the execution-space control signal, and if the code is executed from external memory then enable a traditional instruction load enable signal so that the cache memory behaves likes a traditional cache and then store the current instruction in the cache memory upon receiving the execution-space control signal.
- FIG. 1 is a flowchart illustrating a method for reducing cache thrashing in a DSP according to an embodiment of the present subject matter.
- FIG. 2 illustrates a block diagram of a DSP cache memory according to an embodiment of the present subject matter, such as those shown in FIG. 1 .
- FIG. 3 is a flowchart illustrating a method for self configuring an instruction cache memory in a DSP according to an embodiment of the present subject matter.
- FIG. 4 illustrates a block diagram of a DSP cache memory according to an embodiment of the present subject matter, such as FIG. 3 .
- cache cache memory
- instruction cache memory instruction cache memory
- conflict cache memory thrashing and cache thrashing
- code instructions
- program instructions
- current frequently executed instructions means first encountered one or more frequently executed instructions in the program during run-time.
- FIG. 1 illustrates an example method 100 for reducing cache thrashing in a digital signal processor (DSP).
- this example method 100 begins by dynamically identifying frequently executed instructions in a program during run-time.
- Exemplary frequently executed instructions in the program include a hardware loop, a nested hardware loop, a call, a backward jump, and the like.
- the frequently executed instructions include instructions having higher probability of reoccurrence during run-time of the program.
- current instructions are cached upon encountering the current frequently executed instructions in the program by dynamically enabling instruction cache memory.
- instruction cache memory is useful if same instruction is required again before the instruction is thrashed during run-time of the program.
- the instruction cache is enabled only for those instructions which have higher probability of reoccurrence to reduce thrashing.
- caching of the instructions is dynamically disabled upon encountering an exit point in the current frequently executed instructions.
- the exit point refers to an exit found in frequently executed instructions, such as loop termination, call return, and the like.
- an N-bit up-counter is incremented upon caching each instruction in the current frequently executed instructions in the instruction cache memory.
- the N-bit up-counter has a number of states that is equal to number of entries available in the instruction cache memory.
- the method 100 determines whether the exit point in the current frequently executed instructions before the N-bit up-counter reaching saturation. Based on the determination at step 140 , the method 100 goes to step 150 .
- the method 100 determines whether the N-bit up-counter has reached saturation. Based on the determination at step 150 , the method 100 goes step 120 if the N-bit up-counter has not reached the saturation and repeats steps 120 - 150 . Based on the determination at step 150 the method 100 goes to step 160 and dynamically disables caching of the current frequently executed instructions if the N-bit up-counter has reached the saturation.
- the N-bit up-counter saturation can signify that instruction cache memory is saturated with instructions.
- step 140 Based on the determination at step 140 the method 100 goes to step 160 and dynamically disables caching of the current frequently executed instructions if the exit point in the current frequently executed instructions is before the N-bit up-counter reaches saturation.
- the method 100 determines if there is a next frequently executed instructions. Based on the determination at step 170 the method 100 goes to step 120 and repeats steps 120 - 170 if there is a next frequently executed instructions in the program. In these embodiments, the instruction cache memory is dynamically re-enabled upon encountering next frequently executed instructions. Based on the determination at step 170 the method 100 goes to step 110 and repeats steps 110 - 170 if there is no other frequently executed set of instructions in the program.
- thrashing can occur causing a performance loss.
- the proposed thrashing-aware scheme dynamically disables caching of the current frequently executed instructions once the instruction cache memory reaches saturation.
- the instruction cache memory is re-enabled when either the loop including the frequently executed instructions is terminated or a nested loop starts executing during run-time. This technique improves performance by reducing thrashing and increasing hit-ratio during run-time of the program.
- the above-described thrashing-aware technique is generally suitable for small instruction cache memories.
- the cache memory is very susceptible to thrashing if every instruction is cached during run-time.
- thrashing can lead to performance loss (i.e., for loop-sizes approximately greater than about 32) or for calls/Cjumps based subroutines which are greater than about 32 instruction.
- ACAM address content addressable memory
- the 5 bit up-counter starts incrementing, upon encountering frequently executed instructions, with every instruction load to the instruction cache memory until the 5-bit up-counter reaches saturation at 32 loads.
- the instruction cache memory is disabled for that particular loop/call upon reaching saturation of the 5 bit up-counter.
- the cache-hit advantage factor (X)/(2X ⁇ Y) can be always greater than 1. This confirms that the hit-ratio for thrashing-aware cache architecture can be always greater than the conventional cache architecture.
- the thrashing-aware cache architecture gives a better hit-ratio when compared with the conventional cache architecture when deploying a combination of caching the frequently executed instructions and exiting upon the cache counter saturation, without increasing cache-size or degrading cache-hit access time.
- the current frequently executed instructions is held in the instructions cache memory until identifying and enabling caching of a next frequently executed instructions in the program.
- caching of instructions is dynamically re-enabled upon encountering next frequently executed instructions.
- the block diagram 200 includes an instruction cache memory 210 , an external memory 230 , and a computational unit 240 .
- the computational unit 240 includes a decoder logic circuit 250 , an N-bit up-counter 260 , an enabler/disabler logic circuit 270 , and a cache controller 280 .
- the instruction cache memory 210 is shown including SET 0 to SET 15 , wherein each SET includes two entries making it a total of 32 entries in the instruction cache memory 210 .
- the computational unit 240 coupled to the instruction cache memory 210 dynamically enables loading of instructions upon encountering frequently executed instructions. Further, the computational unit 240 dynamically disables loading the instructions upon encountering an exit point associated with the frequently executed instructions in a program.
- the N-bit up-counter 260 has a number of states that is equal to a predetermined number of entries in the instruction cache memory 210 .
- the decoder logic circuit 250 locates the current frequently executed instructions in the program.
- the enabler/disabler logic circuit 270 enables storing of the instructions associated with the located frequently executed instructions via the cache controller 280 .
- the N-bit up-counter 260 then increments upon storing each instruction in the instruction cache memory 210 .
- the enabler/disabler logic circuit 270 then disables the storing of the instructions in the instruction cache memory 210 via the cache controller 280 upon the N-bit up-counter 260 reaching a saturation point or upon encountering the exit point in the instructions associated with the frequently executed instructions before reaching the saturation point.
- the instruction cache memory 210 has a predetermined number of entries 205 .
- the N-bit up-counter 260 has a number of states that is equal to the predetermined number of entries in the internal cache memory 210 . The N-bit up-counter 260 then increments a counter value for each instruction that is stored in the instruction cache memory 210 .
- the enabler/disabler logic circuit 270 then disables the storing of the instructions in the frequently executed instructions via the cache controller 280 upon the N-bit up-counter 260 reaching a counter value equal to the number of states in the N-bit up-counter 260 or upon encountering the exit point in the instructions before the counter value in the N-bit up-counter 260 becomes equal to the number of states in the N-bit up-counter 260 .
- FIG. 2 The operation of the thrashing-aware cache architecture shown in FIG. 2 is described above in more detail with reference to the flowchart 100 shown in FIG. 1 .
- FIG. 3 illustrates an example method 300 for a self-configuring cache in a digital signal processor (DSP).
- DSP digital signal processor
- this example method 300 begins by dynamically determining whether a current instruction in an executable program is coming from an external memory or an internal memory. Based on the determination at step 310 , the method 300 goes to step 320 and outputs an external execution-space control signal if the current instruction is coming from the external memory.
- a traditional instruction load enable signal is outputted so that the cache memory behaves like a traditional cache.
- the method 300 goes to step 330 and outputs an internal execution-space control signal if the current instruction is coming from the internal memory.
- the method determines whether the fetch phase of the current instruction coincides with the memory access of a preceding load or a store instruction. Based on the determination at step 350 the method 300 goes to step 360 if the fetch phase of the current instruction coincides with the memory access of the preceding load of the store instruction and outputs a conflict instruction load enable signal so that the cache memory behaves like a conflict cache. This generally indicates a conflict condition.
- the method 300 goes to step 310 via step 355 to fetch a next current instruction and repeats steps 310 - 360 if the fetch phase of the current instruction does not coincide with the memory access of the preceding load or the store instruction.
- the block diagram 400 includes a cache memory 410 , an internal memory 420 , an external memory 430 , and a computational unit 440 .
- the computational unit 440 further comprises an execution-space decode logic circuit 450 and a cache control logic circuit 460 .
- the cache control logic circuit 460 includes a conflict instruction cache enabler 470 , a traditional instruction cache enabler 480 , a MUX 490 , and a cache controller 495 .
- the execution-space decode logic circuit 450 dynamically determines whether a current instruction in an executable program is coming from the external memory 430 or the internal memory 420 .
- the cache control logic circuit 460 then configures the cache memory 410 to behave like a traditional cache or a conflict cache based on an outcome of the determination by the execution-space decode logic circuit 450 .
- the cache control logic circuit 460 then transfers the current instruction to and between the cache memory 410 , the internal memory 420 and the external memory 430 based on the configured cache memory.
- the execution-space decode logic circuit 450 determines during run-time execution of the executable instructions whether a current instruction in the executable instructions in the executable program is coming from the external memory 430 or the internal memory 420 . The execution-space decode logic circuit 450 then outputs an external execution-space control signal if the current instruction is coming from the external memory 430 and outputs an internal execution-space control signal if the current instruction is coming from the internal memory 420 .
- the conflict instruction cache enabler 470 determines whether the current instruction in the executable program has a memory conflict condition and then outputs a conflict instruction load enable signal upon finding the memory conflict condition.
- the traditional instruction cache enabler 480 then enables a traditional instruction load enable signal for the current instruction in the executable program upon receiving the current instruction from the external memory 430 .
- the MUX 490 then outputs an instruction load enable signal via the cache controller 495 and configures the cache memory 410 to behave like a traditional cache or a conflict cache based on the instruction load enable signal.
- the instruction load enable signal then transfers the current instruction to and between the cache memory 410 , the internal memory 420 , and the external memory 430 based on the configuration of the cache memory 410 .
- the MUX 490 outputs the instruction load enable signal and enables the cache memory 410 to behave like a conflict cache via the cache controller 495 and transfers the current instruction to and between the internal memory 420 , the cache memory 410 and the computational unit 440 upon finding a memory conflict condition and receiving the internal execution-space control signal from the conflict instruction cache enabler 470 .
- the MUX 490 outputs the instruction load enable signal and enables the cache memory 410 to behave like a traditional cache via the cache controller 495 and transfers the current instruction, coming from the external memory 430 , to and between the cache memory 410 and the computation unit 440 upon receiving the current instruction from the external memory 430 and the traditional instruction load enable signal from the traditional instruction cache enabler 480 .
- FIGS. 1 and 3 include steps 110 - 170 and 310 - 360 that are arranged serially in the exemplary embodiments, other embodiments of the subject matter may execute two or more steps in parallel, using multiple processors or a single processor organized as two or more virtual machines or sub-processors. Moreover, still other embodiments may implement the steps as two or more specific interconnected hardware modules with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the exemplary process flow diagrams are applicable to software, firmware, and/or hardware implementations.
- the above thrashing-aware architecture increases the digital signal processor performance by reducing cache thrashing and increasing hit-ratio. Further, the above process lowers power dissipation by reducing loading of unwanted instructions into cache memory. Further, the above thrashing-aware process is suitable for caches of small sizes used in digital signal processors.
- the above-described self-configuring cache architecture is facilitates in significantly improving the cache functionality by using the same cache hardware as a traditional cache and conflict cache thereby eliminating the need for having two physically different cache in a DSP.
- the above described context switching self-configuring cache seamlessly switches between conflict cache to traditional cache and vice-versa without any user intervention.
- the above process uses same cache hardware as conflict cache to avoid resource-conflict during code execution from internal memory and as traditional instruction cache to improve performance during code execution from external memory where there is no resource-conflict.
- the above techniques can be implemented using an apparatus controlled by a processor where the processor is provided with instructions in the form of a computer program constituting an aspect of the above technique.
- a computer program may be stored in storage medium as computer readable instructions so that the storage medium constitutes a further aspect of the present subject matter.
- FIG. 1 depicts a simple case of caching the frequently executed instructions which are not nested to improve hit ratio, one can envision implementing the above-described process for nested loops and other such frequently executed instructions as well.
- the present subject matter can be implemented in a number of different embodiments, including various methods, a circuit, an I/O device, a system, and an article comprising a machine-accessible medium having associated instructions.
- FIGS. 1-4 are merely representational and are not drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. FIGS. 1-4 illustrate various embodiments of the subject matter that can be understood and appropriately carried out by those of ordinary skill in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Improved thrashing aware and self configuring cache architectures that reduce cache thrashing without increasing cache size or degrading cache hit access time, for a DSP. In one example embodiment, that is accomplished by selectively caching only the instructions having a higher probability of recurrence to considerably reduce cache thrashing.
Description
- The present invention relates to digital signal processors, and more particularly to real-time memory management for digital signal processors.
- A digital signal computer or digital signal processor (DSP) is a special purpose computer that is designed to optimize performance for digital signal processing applications, such as, for example, fast Fourier transforms, digital filters, image processing and speech recognition. DSP applications are characterized by real-time operation, high interrupt rates, and intensive numeric computations. In addition, DSP applications tend to be intensive in memory access operations and to require the input and output of large quantities of data. Thus, designs of DSPs may be quite different from those of general purpose processors.
- One approach that has been used in the architecture of DSPs is the Harvard architecture, which utilizes separate, independent program and data memories so that two memories may be accessed simultaneously. This permits instructions and data to be accessed in a single clock cycle. Frequently, the program occupies less memory space than data. To achieve full memory utilization, a modified Harvard architecture utilizes the program memory for storing both instructions and data. Typically, the program and data memories are interconnected to the core processor by separate program and data buses.
- When instructions and data are stored in the program memory, conflicts may arise in the fetching of instructions. Further, in the case of Harvard architecture, the instruction fetch and the data access can take place in the same clock cycle, which can lead to a conflict on the program memory bus. In this scenario, instructions which can generally be fetched in a single clock cycle for a case can stall a cycle due to conflict. This happens when the instructions fetch phase coincides with the memory access phase of a preceding load or store instruction on the program memory bus. Such instructions are cached in conflict cache so that next time when the same instructions are encountered, it can be fetched from the conflict cache to avoid the instruction fetch phase stalls. In addition to the conflict cache, traditional instruction cache is also required for fetching instructions from the external main memory. This results in requiring two different cache architectures.
- Further, conventional instruction cache architectures exploit the locality of code to maximize cache-hits. Most of the cache architectures suffer from performance degradation due to cache thrashing, i.e., loading the cache with instruction and then removing it while it is still needed before it can be used by the computer system. Cache thrashing is, of course, undesirable, as it reduces the performance gains.
- Conventional techniques reduce cache thrashing by increasing the cache size, increasing cache-associatively, having a victim cache, and so on. However, these techniques come with overheads like extra hardware, increased cache hit access time, and/or higher software overhead. Another conventional technique identifies frequently executed instructions after code-profiling and locking the cache through software to minimize cache thrashing. However, this technique requires additional overheads in terms of requiring profiling of code by user and extra instructions in the code to lock the cache. Further, this can make the code very cumbersome.
- According to an aspect of the subject matter, there is provided a method for reducing cache thrashing in a DSP, comprising the steps of dynamically enabling caching of instructions upon encountering current frequently executed instructions in a program, and dynamically disabling the caching of the instructions upon encountering an exit point associated with the frequently executed instructions.
- According to another aspect of the subject matter, there is provided a method for self configuring a cache memory in a digital signal processor, comprising determining during run-time execution of a program whether a current instruction is coming from an external main memory or internal memory, outputting an execution-space control signal based on the determination that code is executed from internal memory, determining whether a fetch phase of the current instruction coincides with the memory access phase of a preceding load or store instruction on program memory bus, if so, outputting a conflict instruction load enable signal so that the cache memory behaves like a conflict cache and store the current instruction in the cache memory upon receiving the execution-space control signal, and if the code is executed from external memory then enable a traditional instruction load enable signal so that the cache memory behaves likes a traditional cache and then store the current instruction in the cache memory upon receiving the execution-space control signal.
- Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
-
FIG. 1 is a flowchart illustrating a method for reducing cache thrashing in a DSP according to an embodiment of the present subject matter. -
FIG. 2 illustrates a block diagram of a DSP cache memory according to an embodiment of the present subject matter, such as those shown inFIG. 1 . -
FIG. 3 is a flowchart illustrating a method for self configuring an instruction cache memory in a DSP according to an embodiment of the present subject matter. -
FIG. 4 illustrates a block diagram of a DSP cache memory according to an embodiment of the present subject matter, such asFIG. 3 . - In the following detailed description of the various embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
- The terms “cache”, “cache memory”, “instruction cache memory”, “conflict cache memory” are used interchangeably throughout the document. Also, the terms “thrashing” and “cache thrashing” are used interchangeably throughout the document. In addition, the terms “code”, “instructions”, and “program” are used interchangeably throughout the document. In addition, the term “current frequently executed instructions” means first encountered one or more frequently executed instructions in the program during run-time.
-
FIG. 1 illustrates anexample method 100 for reducing cache thrashing in a digital signal processor (DSP). Atstep 110, thisexample method 100 begins by dynamically identifying frequently executed instructions in a program during run-time. Exemplary frequently executed instructions in the program include a hardware loop, a nested hardware loop, a call, a backward jump, and the like. In some embodiments, the frequently executed instructions include instructions having higher probability of reoccurrence during run-time of the program. - At
step 120, current instructions are cached upon encountering the current frequently executed instructions in the program by dynamically enabling instruction cache memory. Generally, instruction cache memory is useful if same instruction is required again before the instruction is thrashed during run-time of the program. In some embodiments, the instruction cache is enabled only for those instructions which have higher probability of reoccurrence to reduce thrashing. - In some embodiments, caching of the instructions is dynamically disabled upon encountering an exit point in the current frequently executed instructions. The exit point refers to an exit found in frequently executed instructions, such as loop termination, call return, and the like. At
step 130, an N-bit up-counter is incremented upon caching each instruction in the current frequently executed instructions in the instruction cache memory. In these embodiments, the N-bit up-counter has a number of states that is equal to number of entries available in the instruction cache memory. - At
step 140, themethod 100 determines whether the exit point in the current frequently executed instructions before the N-bit up-counter reaching saturation. Based on the determination atstep 140, themethod 100 goes to step 150. Atstep 150, themethod 100 determines whether the N-bit up-counter has reached saturation. Based on the determination atstep 150, themethod 100 goesstep 120 if the N-bit up-counter has not reached the saturation and repeats steps 120-150. Based on the determination atstep 150 themethod 100 goes to step 160 and dynamically disables caching of the current frequently executed instructions if the N-bit up-counter has reached the saturation. In these embodiments, the N-bit up-counter saturation can signify that instruction cache memory is saturated with instructions. - Based on the determination at
step 140 themethod 100 goes to step 160 and dynamically disables caching of the current frequently executed instructions if the exit point in the current frequently executed instructions is before the N-bit up-counter reaches saturation. - At
step 170, themethod 100 determines if there is a next frequently executed instructions. Based on the determination atstep 170 themethod 100 goes tostep 120 and repeats steps 120-170 if there is a next frequently executed instructions in the program. In these embodiments, the instruction cache memory is dynamically re-enabled upon encountering next frequently executed instructions. Based on the determination atstep 170 themethod 100 goes to step 110 and repeats steps 110-170 if there is no other frequently executed set of instructions in the program. - In the case of a hardware loop or other such frequently occurring code including instructions that are greater than the length of the instruction cache memory, thrashing can occur causing a performance loss. As described above the proposed thrashing-aware scheme dynamically disables caching of the current frequently executed instructions once the instruction cache memory reaches saturation. The instruction cache memory is re-enabled when either the loop including the frequently executed instructions is terminated or a nested loop starts executing during run-time. This technique improves performance by reducing thrashing and increasing hit-ratio during run-time of the program. The above-described thrashing-aware technique is generally suitable for small instruction cache memories.
- For example, in the case of a DSP having a small cache memory of 32 entries, the cache memory is very susceptible to thrashing if every instruction is cached during run-time. In the case of big loops, thrashing can lead to performance loss (i.e., for loop-sizes approximately greater than about 32) or for calls/Cjumps based subroutines which are greater than about 32 instruction. In order to avoid this problem, using a 5-bit up-counter to count 32 ACAM (address content addressable memory) loads in conjunction with instruction-based caching including a decoder logic circuit which decodes the frequently executed instructions, such as loops, calls, nested loops, negative jumps and the like as described above can increase cache hit-ratio. In this scenario, the 5 bit up-counter starts incrementing, upon encountering frequently executed instructions, with every instruction load to the instruction cache memory until the 5-bit up-counter reaches saturation at 32 loads. The instruction cache memory is disabled for that particular loop/call upon reaching saturation of the 5 bit up-counter.
- The following equation illustrates the benefits of using the above-described technique to reduce thrashing and increase hit-ratio during run-time of a program:
- Considering a case where an instruction cache memory has “X” entries and a frequently occurring set of instructions or code segment has a length of “Y” that occurs “N” times.
- For Conventional Cache Architecture:
- If “Y”<“X”, then the hit-ratio=N−/N
- If “X”<“Y”<“2X”, then the hit-ratio=(Y−(Y−X)*2)(N−1)/NY
- If “Y”>“2X”, then the Hit-ratio=0
- For Thrashing-Aware Cache Architecture:
- If “Y”<“X”, then the hit-ratio=N−1/N
- If “Y”>“X”, then the hit-ratio=X(N−1)/NY
- Now for “X”<Y”<“2X”,
- The cache-hit advantage factor for thrashing aware cache architecture over the conventional cache architecture
-
=X/(Y−(Y−X)*2) -
=X/(2X−Y) - It can be seen that for “X”<“Y”<“2X”, the cache-hit advantage factor (X)/(2X−Y) can be always greater than 1. This confirms that the hit-ratio for thrashing-aware cache architecture can be always greater than the conventional cache architecture.
- Similarly, for cases where “Y”>“2X”, conventional cache architecture returns 0 hits, whereas the thrashing aware cache architecture can continue to return “X” hits per iterations.
- The above example clearly illustrates that the thrashing-aware cache architecture gives a better hit-ratio when compared with the conventional cache architecture when deploying a combination of caching the frequently executed instructions and exiting upon the cache counter saturation, without increasing cache-size or degrading cache-hit access time. In some embodiments, the current frequently executed instructions is held in the instructions cache memory until identifying and enabling caching of a next frequently executed instructions in the program. In these embodiments, caching of instructions is dynamically re-enabled upon encountering next frequently executed instructions.
- Referring now to
FIG. 2 , there is illustrated an example block diagram 200 of DSP thrashing-aware cache architecture. As shown inFIG. 2 , the block diagram 200 includes aninstruction cache memory 210, anexternal memory 230, and acomputational unit 240. Further as shown inFIG. 2 , thecomputational unit 240 includes adecoder logic circuit 250, an N-bit up-counter 260, an enabler/disabler logic circuit 270, and acache controller 280. Furthermore, theinstruction cache memory 210 is shown including SET 0 to SET 15, wherein each SET includes two entries making it a total of 32 entries in theinstruction cache memory 210. - In operation, the
computational unit 240 coupled to theinstruction cache memory 210 dynamically enables loading of instructions upon encountering frequently executed instructions. Further, thecomputational unit 240 dynamically disables loading the instructions upon encountering an exit point associated with the frequently executed instructions in a program. - In some embodiments, the N-bit up-
counter 260 has a number of states that is equal to a predetermined number of entries in theinstruction cache memory 210. In these embodiments, thedecoder logic circuit 250 locates the current frequently executed instructions in the program. Also, in these embodiments, the enabler/disabler logic circuit 270 enables storing of the instructions associated with the located frequently executed instructions via thecache controller 280. The N-bit up-counter 260 then increments upon storing each instruction in theinstruction cache memory 210. The enabler/disabler logic circuit 270 then disables the storing of the instructions in theinstruction cache memory 210 via thecache controller 280 upon the N-bit up-counter 260 reaching a saturation point or upon encountering the exit point in the instructions associated with the frequently executed instructions before reaching the saturation point. - In some embodiments, the
instruction cache memory 210 has a predetermined number ofentries 205. Also, in these embodiments, the N-bit up-counter 260 has a number of states that is equal to the predetermined number of entries in theinternal cache memory 210. The N-bit up-counter 260 then increments a counter value for each instruction that is stored in theinstruction cache memory 210. The enabler/disabler logic circuit 270 then disables the storing of the instructions in the frequently executed instructions via thecache controller 280 upon the N-bit up-counter 260 reaching a counter value equal to the number of states in the N-bit up-counter 260 or upon encountering the exit point in the instructions before the counter value in the N-bit up-counter 260 becomes equal to the number of states in the N-bit up-counter 260. - The operation of the thrashing-aware cache architecture shown in
FIG. 2 is described above in more detail with reference to theflowchart 100 shown inFIG. 1 . -
FIG. 3 illustrates anexample method 300 for a self-configuring cache in a digital signal processor (DSP). Atstep 310, thisexample method 300 begins by dynamically determining whether a current instruction in an executable program is coming from an external memory or an internal memory. Based on the determination atstep 310, themethod 300 goes to step 320 and outputs an external execution-space control signal if the current instruction is coming from the external memory. Atstep 340, a traditional instruction load enable signal is outputted so that the cache memory behaves like a traditional cache. - Based on the determination at
step 310, themethod 300 goes to step 330 and outputs an internal execution-space control signal if the current instruction is coming from the internal memory. Atstep 350, the method determines whether the fetch phase of the current instruction coincides with the memory access of a preceding load or a store instruction. Based on the determination atstep 350 themethod 300 goes to step 360 if the fetch phase of the current instruction coincides with the memory access of the preceding load of the store instruction and outputs a conflict instruction load enable signal so that the cache memory behaves like a conflict cache. This generally indicates a conflict condition. Based on the determination atstep 350 themethod 300 goes to step 310 viastep 355 to fetch a next current instruction and repeats steps 310-360 if the fetch phase of the current instruction does not coincide with the memory access of the preceding load or the store instruction. - Referring now to
FIG. 4 , there is illustrated an example block diagram 400 of DSP self-configuring cache architecture. As shown inFIG. 4 , the block diagram 400 includes acache memory 410, aninternal memory 420, anexternal memory 430, and acomputational unit 440. As shown inFIG. 4 , thecomputational unit 440 further comprises an execution-spacedecode logic circuit 450 and a cachecontrol logic circuit 460. Further as shown inFIG. 4 , the cachecontrol logic circuit 460 includes a conflictinstruction cache enabler 470, a traditionalinstruction cache enabler 480, aMUX 490, and acache controller 495. - In operation, the execution-space
decode logic circuit 450 dynamically determines whether a current instruction in an executable program is coming from theexternal memory 430 or theinternal memory 420. The cachecontrol logic circuit 460 then configures thecache memory 410 to behave like a traditional cache or a conflict cache based on an outcome of the determination by the execution-spacedecode logic circuit 450. The cachecontrol logic circuit 460 then transfers the current instruction to and between thecache memory 410, theinternal memory 420 and theexternal memory 430 based on the configured cache memory. - In some embodiments, the execution-space
decode logic circuit 450 determines during run-time execution of the executable instructions whether a current instruction in the executable instructions in the executable program is coming from theexternal memory 430 or theinternal memory 420. The execution-spacedecode logic circuit 450 then outputs an external execution-space control signal if the current instruction is coming from theexternal memory 430 and outputs an internal execution-space control signal if the current instruction is coming from theinternal memory 420. - In some embodiments, the conflict
instruction cache enabler 470 determines whether the current instruction in the executable program has a memory conflict condition and then outputs a conflict instruction load enable signal upon finding the memory conflict condition. The traditionalinstruction cache enabler 480 then enables a traditional instruction load enable signal for the current instruction in the executable program upon receiving the current instruction from theexternal memory 430. TheMUX 490 then outputs an instruction load enable signal via thecache controller 495 and configures thecache memory 410 to behave like a traditional cache or a conflict cache based on the instruction load enable signal. The instruction load enable signal then transfers the current instruction to and between thecache memory 410, theinternal memory 420, and theexternal memory 430 based on the configuration of thecache memory 410. - In some embodiments, the
MUX 490 outputs the instruction load enable signal and enables thecache memory 410 to behave like a conflict cache via thecache controller 495 and transfers the current instruction to and between theinternal memory 420, thecache memory 410 and thecomputational unit 440 upon finding a memory conflict condition and receiving the internal execution-space control signal from the conflictinstruction cache enabler 470. In these embodiments, theMUX 490 outputs the instruction load enable signal and enables thecache memory 410 to behave like a traditional cache via thecache controller 495 and transfers the current instruction, coming from theexternal memory 430, to and between thecache memory 410 and thecomputation unit 440 upon receiving the current instruction from theexternal memory 430 and the traditional instruction load enable signal from the traditionalinstruction cache enabler 480. - Although the
flowcharts FIGS. 1 and 3 include steps 110-170 and 310-360 that are arranged serially in the exemplary embodiments, other embodiments of the subject matter may execute two or more steps in parallel, using multiple processors or a single processor organized as two or more virtual machines or sub-processors. Moreover, still other embodiments may implement the steps as two or more specific interconnected hardware modules with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the exemplary process flow diagrams are applicable to software, firmware, and/or hardware implementations. - The above thrashing-aware architecture increases the digital signal processor performance by reducing cache thrashing and increasing hit-ratio. Further, the above process lowers power dissipation by reducing loading of unwanted instructions into cache memory. Further, the above thrashing-aware process is suitable for caches of small sizes used in digital signal processors.
- The above-described self-configuring cache architecture is facilitates in significantly improving the cache functionality by using the same cache hardware as a traditional cache and conflict cache thereby eliminating the need for having two physically different cache in a DSP. The above described context switching self-configuring cache seamlessly switches between conflict cache to traditional cache and vice-versa without any user intervention. The above process uses same cache hardware as conflict cache to avoid resource-conflict during code execution from internal memory and as traditional instruction cache to improve performance during code execution from external memory where there is no resource-conflict.
- The above techniques can be implemented using an apparatus controlled by a processor where the processor is provided with instructions in the form of a computer program constituting an aspect of the above technique. Such a computer program may be stored in storage medium as computer readable instructions so that the storage medium constitutes a further aspect of the present subject matter.
- Although the flowchart shown in
FIG. 1 depicts a simple case of caching the frequently executed instructions which are not nested to improve hit ratio, one can envision implementing the above-described process for nested loops and other such frequently executed instructions as well. - The above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those skilled in the art. The scope of the subject matter should therefore be determined by the appended claims, along with the full scope of equivalents to which such claims are entitled.
- As shown herein, the present subject matter can be implemented in a number of different embodiments, including various methods, a circuit, an I/O device, a system, and an article comprising a machine-accessible medium having associated instructions.
- Other embodiments will be readily apparent to those of ordinary skill in the art. The elements, algorithms, and sequence of operations can all be varied to suit particular requirements. The operations described-above with respect to the methods illustrated in
FIGS. 1 , 2, and 4 can be performed in a different order from those shown and described herein. -
FIGS. 1-4 are merely representational and are not drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized.FIGS. 1-4 illustrate various embodiments of the subject matter that can be understood and appropriately carried out by those of ordinary skill in the art. - In the foregoing detailed description of the embodiments of the invention, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description of the embodiments of the invention, with each claim standing on its own as a separate preferred embodiment.
Claims (21)
1. A method for reducing cache thrashing in a digital signal processor (DSP), comprising:
dynamically enabling caching of instructions upon encountering current frequently executed instructions in a program; and
dynamically disabling the caching of the instructions upon encountering an exit point in the frequently executed instructions.
2. The method of claim 1 , further comprising:
dynamically identifying the current frequently executed instructions during run-time of the program.
3. The method of claim 2 , further comprising:
holding the current frequently executed instructions in instruction cache memory until identifying and enabling caching of the instructions in next frequently executed instructions.
4. The method of claim 1 , wherein the frequently executed instructions comprises instructions selected from the group consisting of a hardware loop, a nested hardware loop, a call, and a backward jump.
5. The method of claim 1 , wherein disabling the caching of the instructions upon encountering an exit point associated with the frequently executed instructions comprises:
incrementing an N-bit up-counter upon caching each of the instructions associated with the current frequently executed instructions into the instruction cache memory, wherein the N-bit up-counter has a number of states equal to number of entries available in the instruction cache memory; and
dynamically disabling the caching of the instructions associated with the current frequently executed instructions into the instruction cache memory upon the N-bit up-counter reaching a counter value equal to the number of states in the N-bit up-counter or upon encountering an exit point, associated with the frequently executed instructions, before the counter value becomes equal to the number of states in the N-bit up-counter.
6. The method of claim 1 , further comprising:
dynamically re-enabling caching of instructions upon encountering next frequently execute instructions.
7. An article comprising:
a storage medium having instructions, that when executed by a computing platform, result in execution of a method for reducing cache thrashing comprising:
dynamically enabling caching of instructions upon encountering current frequently executed instructions in a program; and
dynamically disabling the caching of the instructions upon encountering an exit point in the frequently executed instructions.
8. The article of claim 7 , further comprising:
dynamically identifying the current frequently executed instructions during run-time.
9. The article of claim 8 , further comprising:
holding the instructions in instruction cache memory until identifying a next frequently executed instructions and enabling caching of the instructions in the next frequently executed instructions.
10. The article of claim 7 , wherein the frequently executed instructions comprises instructions selected from the group consisting of a hardware loop, a nested hardware loop, a call, and a backward jump.
11. The article of claim 7 , wherein disabling the caching of the instructions upon encountering an exit point associated with the frequently executed instructions comprises:
incrementing an N-bit up-counter upon caching each of the instructions into the instruction cache memory, wherein the N-bit up-counter has a number of states equal to number of entries available in the instruction cache memory; and
dynamically disabling the caching of the instructions into the instruction cache memory upon the N-bit up-counter reaching a counter value equal to the number of states in the N-bit up-counter or upon encountering an exit point, associated with the frequently executed instructions, before the counter value becomes equal to the number of states in the N-bit up-counter.
12. A digital signal processor, comprising:
an instruction cache memory; and
a computational unit coupled to the instruction cache memory to dynamically enable loading of instructions upon encountering frequently executed instructions in a program and to dynamically disable loading of instructions upon encountering an exit point associated with the frequently executed instructions.
13. The digital signal processor of claim 12 , wherein the computational unit comprises:
an N-bit up-counter having a number of states that is equal to a predetermined number of entries in the instruction cache memory;
a decoder logic circuit that locates the current frequently executed instructions in the program;
a cache controller; and
an enabler/disabler logic circuit that enables caching of the instructions associated with the located current frequently executed instructions via the cache controller, wherein the N-bit up-counter increments upon storing each instruction in the instruction cache memory, and wherein the enabler/disabler circuit disables the caching of the instructions in the instruction cache memory via the cache controller upon the N-bit up-counter reaching a saturation point or upon encountering an exit point in the instructions associated with the frequently executed instructions before reaching the saturation point.
14. The digital signal processor of claim 13 , wherein the instruction cache memory has a predetermined number of entries, wherein the N-bit up-counter has a number of states that is equal to the predetermined number of entries in the internal cache memory, wherein the N-bit up-counter increments a counter value for each instruction stored in the instruction cache memory, and wherein the enabler/disabler logic circuit disables the storing of the instructions via the cache controller upon the N-bit up-counter reaching a counter value equal to the number of states in the N-bit up-counter or upon encountering an exit point, associated with the frequently executed instructions, before the counter value becomes equal to the number of states in the N-bit up-counter.
15. The digital signal processor of claim 12 , wherein the frequently executed instructions comprises instructions selected from the group consisting of a hardware loop, a nested hardware loop, a call, and a backward jump.
16. A self-configuring cache architecture for a digital signal processor, comprising:
cache memory;
an internal memory;
an external memory; and
a computational unit comprising:
an execution-space decode logic circuit that dynamically determines whether a current instruction in an executable program that is coming from an external memory or an internal memory; and
a cache control logic circuit that configures the cache memory to behave like a traditional cache or a conflict cache based on the outcome of the determination, wherein the cache control logic circuit transfers the current instruction to and between the cache memory, the internal memory, and the external memory based on the configuration of the cache memory.
17. The self-configuring cache architecture of claim 16 , wherein the execution-space decode logic circuit determines, during run-time execution of the executable program, whether a instruction is coming from the external memory or the internal memory and then outputs an external execution-space control signal if the current instruction is coming from the external memory and outputs an internal execution-space control signal if the current instruction is coming from the internal memory.
18. The self-configuring cache architecture of claim 17 , wherein the cache control logic circuit comprises:
a cache controller;
a conflict instruction cache enabler that determines whether the current instruction in the executable program has a memory conflict condition and then outputs a conflict instruction load enable signal upon finding the memory conflict condition;
a traditional instruction cache enabler that enables a traditional instruction load enable signal for the current instruction in the executable program upon receiving the current instruction from the external memory; and
a MUX coupled to the execution-space decode logic circuit, the conflict instruction cache enabler and the traditional instruction cache enabler outputs an instruction load enable signal via the cache controller to configure the cache memory to behave like a traditional cache or a conflict cache based on the instruction load enable signal, wherein the instruction load enable signal transfers the current instruction to and between the cache memory, the internal memory, and the external memory based on the configuration of the cache memory.
19. The self-configuring cache architecture of claim 18 , wherein the MUX outputs the instruction load enable signal and enables the cache memory to behave like a conflict cache via the cache controller and transfers the current instruction to and between the internal memory, cache memory and the computational unit upon finding the memory conflict condition and receiving the internal execution-space control signal.
20. The self-configuring cache architecture of claim 19 , wherein the MUX outputs the instruction load enable signal and enables the cache memory to behave like a traditional cache via the cache controller and transfers the current instruction, coming from the external memory, to and between the cache memory and the computation unit upon receiving the current instruction from the external memory and the traditional instruction load enable signal from the traditional instruction cache enabler.
21. A method for self configuring a cache memory in a digital signal processor, comprising:
determining during run-time execution of a program whether a current instruction is coming from an external memory or an internal memory;
outputting an external execution-space control signal or an internal execution-space control signal based on the determination;
determining whether a fetch phase of the current instruction coincides with the memory access phase of a preceding load or store instruction on program memory bus;
if so, outputting a conflict instruction load enable signal so that the cache memory behaves like a conflict cache and stores the current instruction in the cache memory upon receiving the internal execution-space control signal; and
outputting a traditional instruction load enable signal so that the cache memory behaves like a traditional cache and then stores the current instruction in the cache memory upon receiving the external execution-space control signal.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/623,760 US20080172529A1 (en) | 2007-01-17 | 2007-01-17 | Novel context instruction cache architecture for a digital signal processor |
US12/835,319 US8219754B2 (en) | 2007-01-17 | 2010-07-13 | Context instruction cache architecture for a digital signal processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/623,760 US20080172529A1 (en) | 2007-01-17 | 2007-01-17 | Novel context instruction cache architecture for a digital signal processor |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/835,319 Division US8219754B2 (en) | 2007-01-17 | 2010-07-13 | Context instruction cache architecture for a digital signal processor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080172529A1 true US20080172529A1 (en) | 2008-07-17 |
Family
ID=39618651
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/623,760 Abandoned US20080172529A1 (en) | 2007-01-17 | 2007-01-17 | Novel context instruction cache architecture for a digital signal processor |
US12/835,319 Active US8219754B2 (en) | 2007-01-17 | 2010-07-13 | Context instruction cache architecture for a digital signal processor |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/835,319 Active US8219754B2 (en) | 2007-01-17 | 2010-07-13 | Context instruction cache architecture for a digital signal processor |
Country Status (1)
Country | Link |
---|---|
US (2) | US20080172529A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120179954A1 (en) * | 2007-03-22 | 2012-07-12 | Research In Motion Limited | Device and method for improved lost frame concealment |
US8767501B2 (en) | 2012-07-17 | 2014-07-01 | International Business Machines Corporation | Self-reconfigurable address decoder for associative index extended caches |
CN104699624A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8600317B2 (en) * | 2012-03-15 | 2013-12-03 | Broadcom Corporation | Linearization signal processing with context switching |
NL2020848B1 (en) * | 2018-05-01 | 2019-11-12 | Marel Poultry B V | System for processing slaughter products, and method for adjusting the mutual positioning of product carriers of such a system. |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6272599B1 (en) * | 1998-10-30 | 2001-08-07 | Lucent Technologies Inc. | Cache structure and method for improving worst case execution time |
US20060195573A1 (en) * | 2003-02-28 | 2006-08-31 | Bea Systems, Inc. | System and method for creating resources in a connection pool |
US20060206874A1 (en) * | 2000-08-30 | 2006-09-14 | Klein Dean A | System and method for determining the cacheability of code at the time of compiling |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5710907A (en) * | 1995-12-22 | 1998-01-20 | Sun Microsystems, Inc. | Hybrid NUMA COMA caching system and methods for selecting between the caching modes |
US6173371B1 (en) * | 1997-04-14 | 2001-01-09 | International Business Machines Corporation | Demand-based issuance of cache operations to a processor bus |
US7039756B2 (en) * | 2003-04-28 | 2006-05-02 | Lsi Logic Corporation | Method for use of ternary CAM to implement software programmable cache policies |
-
2007
- 2007-01-17 US US11/623,760 patent/US20080172529A1/en not_active Abandoned
-
2010
- 2010-07-13 US US12/835,319 patent/US8219754B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6272599B1 (en) * | 1998-10-30 | 2001-08-07 | Lucent Technologies Inc. | Cache structure and method for improving worst case execution time |
US20060206874A1 (en) * | 2000-08-30 | 2006-09-14 | Klein Dean A | System and method for determining the cacheability of code at the time of compiling |
US20060195573A1 (en) * | 2003-02-28 | 2006-08-31 | Bea Systems, Inc. | System and method for creating resources in a connection pool |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120179954A1 (en) * | 2007-03-22 | 2012-07-12 | Research In Motion Limited | Device and method for improved lost frame concealment |
US8848806B2 (en) * | 2007-03-22 | 2014-09-30 | Blackberry Limited | Device and method for improved lost frame concealment |
US9542253B2 (en) | 2007-03-22 | 2017-01-10 | Blackberry Limited | Device and method for improved lost frame concealment |
US8767501B2 (en) | 2012-07-17 | 2014-07-01 | International Business Machines Corporation | Self-reconfigurable address decoder for associative index extended caches |
CN104699624A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method |
Also Published As
Publication number | Publication date |
---|---|
US8219754B2 (en) | 2012-07-10 |
US20110010500A1 (en) | 2011-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7461237B2 (en) | Method and apparatus for suppressing duplicative prefetches for branch target cache lines | |
US6925643B2 (en) | Method and apparatus for thread-based memory access in a multithreaded processor | |
US8219754B2 (en) | Context instruction cache architecture for a digital signal processor | |
US20080270758A1 (en) | Multiple thread instruction fetch from different cache levels | |
US20150106598A1 (en) | Computer Processor Employing Efficient Bypass Network For Result Operand Routing | |
US5710913A (en) | Method and apparatus for executing nested loops in a digital signal processor | |
US9170816B2 (en) | Enhancing processing efficiency in large instruction width processors | |
US7596683B2 (en) | Switching processor threads during long latencies | |
US20020194466A1 (en) | Repeat instruction with interrupt | |
JP2005514678A (en) | Multi-threaded processor with efficient processing for centralized device applications | |
US20040098540A1 (en) | Cache system and cache memory control device controlling cache memory having two access modes | |
US7290089B2 (en) | Executing cache instructions in an increased latency mode | |
US20040168039A1 (en) | Simultaneous Multi-Threading Processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating | |
US7870364B2 (en) | Reconfigurable apparatus and method for providing multiple modes | |
US5710914A (en) | Digital signal processing method and system implementing pipelined read and write operations | |
US20070300044A1 (en) | Method and apparatus for interfacing a processor and coprocessor | |
US9317438B2 (en) | Cache memory apparatus, cache control method, and microprocessor system | |
US7353337B2 (en) | Reducing cache effects of certain code pieces | |
CN110825442B (en) | Instruction prefetching method and processor | |
US20070300042A1 (en) | Method and apparatus for interfacing a processor and coprocessor | |
KR101239272B1 (en) | A dual function adder for computing a hardware prefetch address and an arithmetic operation value | |
JP4067063B2 (en) | Microprocessor | |
US7925862B2 (en) | Coprocessor forwarding load and store instructions with displacement to main processor for cache coherent execution when program counter value falls within predetermined ranges | |
US7302524B2 (en) | Adaptive thread ID cache mechanism for autonomic performance tuning | |
CN111475203B (en) | Instruction reading method for processor and corresponding processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ANALOG DEVICES, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RINGE, TUSHAR P;GIRI, ABHIJIT;REEL/FRAME:018762/0929;SIGNING DATES FROM 20070105 TO 20070109 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |