WO2014052561A1 - Methods and apparatus for managing page crossing instructions with different cacheability - Google Patents
Methods and apparatus for managing page crossing instructions with different cacheability Download PDFInfo
- Publication number
- WO2014052561A1 WO2014052561A1 PCT/US2013/061876 US2013061876W WO2014052561A1 WO 2014052561 A1 WO2014052561 A1 WO 2014052561A1 US 2013061876 W US2013061876 W US 2013061876W WO 2014052561 A1 WO2014052561 A1 WO 2014052561A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- cacheable
- instructions
- cache
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
- G06F12/0886—Variable-length word access
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30149—Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/452—Instruction code
Definitions
- the present invention relates generally to techniques to improve efficiency in a processor which processes instructions having a variety of lengths, and, more particularly, to advantageous techniques for identifying instructions that cross boundaries between cacheable and non-cacheable memory and allowing this entire instruction to be stored in a cache line with other cacheable instructions.
- a number of processors are designed to execute instructions of different lengths, such as 8-bit, 16-bit, 32-bit, and 64-bit instructions, for example.
- Programs for such a processor may contain a combination of these different length instructions chosen from a variable-length instruction set architecture.
- a processor may also have a hierarchical memory configuration with multi-levels of caches and may include an instruction cache, a data cache, and system memory, for example.
- the instruction cache may be configured to store and access a plurality of instructions together in cache lines. In a processor architecture supporting 16-bit and 32-bit instructions, 32-bit instructions may be stored unaligned in a cache line.
- a 32-bit instruction having its first 16-bit half-word stored in an odd 16-bit half-word address is considered not aligned.
- a 256-bit cache line may store eight 32-bit instructions, or sixteen 16-bit instructions, or a combination of both 16-bit and 32-bit instructions.
- a cache line having a mix of 16-bit and 32-bit instructions may have the last 32-bit instruction crossing between two cache lines.
- a virtual memory system may be used that partitions the memory into pages, such as 4 kilobyte (4k byte) pages.
- the last 32-bit instruction in a cache line that crosses between two cache lines may also cross a page boundary.
- Each page may be assigned different attributes, which may include, for example, whether information stored on the page is cacheable or not cacheable.
- an instruction split across a cache line and across a page boundary may be subject to conflicting page attributes.
- all instructions except the last instruction in the cache line may be from a first exemplary page having attributes that are cacheable, while the last instruction split across the cache line and the page boundary may have an attribute indicating a first part is cacheable while a second part is not cacheable.
- Such conflicts may be difficult to resolve without affecting the performance of the majority of instructions in the cache line identified with the boundary splitting last instruction.
- embodiments of the present invention recognize that performance can be improved by storing cacheable instructions in a cache line identified with a page boundary splitting last instruction.
- An embodiment of the invention recognizes that a need exists for a method to manage page crossing instructions with different cacheability.
- An indication is set for an ending portion of an instruction that was fetched from a first page of non-cacheable instructions and established with a beginning portion of the instruction that was fetched from a second page of cacheable instructions in a cache line having cacheable instructions, wherein the instruction crosses a cache line boundary.
- the indication is detected in a fetch pipeline when hitting on the established cache line to set a non-cacheable flag to indicate that the instruction cannot be executed from the instruction cache, wherein the instruction is received but not executed from the cache based on the non-cacheable flag. At least the ending portion of the instruction is refetched from memory bypassing the cache in response to the non-cacheable flag to combine with the beginning portion of the instruction, wherein the instruction is reconstructed for execution.
- An instruction cache is configured to store cacheable instructions and an instruction having a beginning portion that is cacheable and an ending portion that is non-cacheable and that crosses a cache line boundary at the end of a cache line.
- An indicator circuit is configured to store in one or more bits an indication that execution permission for the instruction is denied, wherein the instruction is identified as a non-cacheable instruction.
- a fetch pipeline is coupled to a processor and configured to detect the indication when hitting on a fetch group of instructions that contains the non-cacheable instruction, wherein the noncacheable instruction is received but not executed from the cache in response to the indication.
- An instruction cache is configured to store cacheable instructions and an instruction having a beginning portion that is cacheable and an ending portion that is non-cacheable, that crosses a page boundary, and a cache line boundary at the end of a cache line.
- An indicator circuit is configured to store an indication that execution permission for the instruction is denied, wherein the instruction is identified as a non-cacheable instruction.
- a fetch pipeline is coupled to a processor and configured to detect the indication when hitting on a fetch group of instructions that contains the non-cacheable instruction, wherein the non-cacheable instruction is refetched from system memory for execution bypassing the cache in response to the indication.
- Another embodiment addresses a computer readable non-transitory medium encoded with computer readable program data and code for operating a system.
- An indication is set for an ending portion of an instruction that was fetched from a first page of non-cacheable instructions and established with a beginning portion of the instruction that was fetched from a second page of cacheable instructions in a cache line having cacheable instructions, wherein the instruction crosses a cache line boundary.
- the indication is detected in a fetch pipeline when hitting on the established cache line to set a non-cacheable flag to indicate that the instruction cannot be executed from the instruction cache, wherein the instruction is received but not executed from the cache based on the non-cacheable flag.
- At least the ending portion of the instruction is refetched from memory bypassing the cache in response to the non-cacheable flag to combine with the beginning portion of the instruction, wherein the instruction is reconstructed for execution.
- Means is utilized for storing cacheable instructions and an instruction having a beginning portion that was fetched from a first page of cacheable instructions and an ending portion that was fetched from a second page of non-cacheable
- Means is provided for indicating that execution permission for the instruction is denied, wherein the instruction is identified as a non-cacheable instruction. Means is also provided for detecting the indication when hitting on a fetch group of instructions that contains the non-cacheable instruction, wherein the non-cacheable instruction is refetched from system memory for execution bypassing the cache in response to the indication.
- a further embodiment addresses an apparatus for controlling execution of page crossing instructions with different cacheability.
- An instruction cache is configured to store cacheable instructions and an instruction having a beginning portion that is non-cacheable and an ending portion that is cacheable and that crosses a cache line boundary at the beginning of a cache line.
- An indicator circuit is configured to store in one or more bits an indication that execution permission for the instruction is denied, wherein the instruction is identified as a non-cacheable instruction.
- a fetch pipeline is coupled to a processor and configurable to detect the indication when hitting on a fetch group of instructions that contains the non-cacheable instruction, wherein the non-cacheable instruction is received but not executed from the cache in response to the indication.
- FIG. l is a block diagram of particular embodiment of a device including a processor complex having an instruction cache that supports instructions which cross cache line boundaries and paged memory boundaries;
- FIG. 2 illustrates a processor complex having a processor, a level 1 instruction cache (LI Icache), an LI data cache (Dcache), a level 2 cache (L2 cache), and a system memory in accordance with an embodiment of the invention;
- LI Icache level 1 instruction cache
- Dcache LI data cache
- L2 cache level 2 cache
- FIG. 3A illustrates an exemplary program segment containing varying length instructions of 16 and 32 bits
- FIG. 3B illustrates exemplary LI Icache lines containing instructions from the program segment 300 of Fig. 3A;
- FIG. 4A illustrates a paged virtual memory system having an instruction translation look aside buffer (ITLB) and a physical memory in accordance with an embodiment of the invention
- FIG. 4B illustrates a virtual to physical address translation subsystem having a line crossing indicator in the LI Icache tags in accordance with an embodiment of the invention
- FIG. 5 illustrates an exemplary two way set associative Icache circuit having a line crossing instruction and a supporting line crossing indicator in accordance with an embodiment of the invention
- FIG. 6 illustrates a process for managing page crossing instructions with different cacheability in accordance with an embodiment of the invention.
- Computer program code or "program code" for being operated upon or for carrying out operations according to the teachings of the invention may be written in a high level programming language such as C, C++, JAVA®, Smalltalk, JavaScript®, Visual Basic®, TSQL, Perl, or in various other programming languages.
- Programs for the target processor architecture may also be written directly in the native assembler language.
- a native assembler program uses instruction mnemonic representations of machine level binary instructions.
- Program code or computer readable non-transitory medium as used herein refers to machine language code such as object code whose format is understandable by a processor.
- FIG. 1 is a block diagram of particular embodiment of a device 100 (e.g., a communication device) including a processor complex 110 having an instruction cache that supports instructions which cross cache line boundaries and paged memory boundaries.
- the device 100 may be a wireless electronic device and include the processor complex 110 coupled to a system memory 112 having computer executable instructions 118.
- the system memory 112 may include the system memory 227 of FIG. 2 or the system memory 452 of FIG. 4B.
- the processor complex 110 may include a processor 111 and an integrated memory subsystem 114 having a level 1 instruction cache (LI Icache) 122, an external tag (xTag) circuit 126, and a cache controller circuit 128.
- LI Icache level 1 instruction cache
- xTag external tag circuit 126
- cache controller circuit 128 a cache controller circuit 128.
- the integrated memory subsystem 114 supports a paged memory organization having one or more pages in program memory that may be specified and identified as non-cacheable.
- the processor 111 may include the processor 210 of FIG. 2 or the processor pipeline 442 of FIG. 4B.
- the integrated memory subsystem 114 may also include a LI data cache and a level 2 unified cache (not shown), such as the LI data cache 214 and the L2 instruction and data cache 226 of FIG. 2 or the L2 cache 450 of FIG. 4B.
- the LI Icache 122 may include the LI Icache 218 of FIG. 2 or the LI Icache 448 of FIG. 4B, as described in more detail below.
- the xTag circuit 126 may also include external permission bits (xPbits) 130 to provide an override indication that controls execution of an instruction, as described in more detail below with regard to the xTag circuit 447 and xPbits 449 of FIG. 4B.
- xPbits external permission bits
- the integrated memory subsystem 114 may be included in the processor complex 110 or may be implemented as one or more separate devices or circuitry (not shown) external to the processor complex 110.
- the processor complex 110 includes any of the circuits and systems of FIGS. 2, 3B, 4A, 4B, and 5, and operates in accordance with any of the embodiments illustrated in or associated with FIG. 3 A and FIG. 6, or any combination thereof.
- the LI Icache 122, the xTag circuit 126, and the cache controller circuit 128 are accessible within the processor complex 110, and the processor 111 is configured to access data or program instructions stored in the memories of the integrated memory subsystem 114 or in the system memory 112.
- a camera interface 134 is coupled to the processor complex 110 and also coupled to a camera, such as a video camera 136.
- a display controller 140 is coupled to the processor complex 110 and to a display device 142.
- a coder/decoder (CODEC) 144 can also be coupled to the processor complex 110.
- a speaker 146 and a microphone 148 can be coupled to the CODEC 144.
- a wireless interface 150 can be coupled to the processor complex 110 and to a wireless antenna 152 such that wireless data received via the antenna 152 and wireless interface 150 can be provided to the processor 111.
- the processor 111 may be configured to execute computer executable instructions 118 stored in a non-transitory computer-readable medium, such as the system memory 112, that are executable to cause a computer, such as the processor 111, to execute a program, such as the program segment 300 of FIG. 3A.
- the computer executable instructions 118 are further executable to cause the processor 111 to process instructions that access the memories of the integrated memory subsystem 114 and the system memory 112.
- the processor complex 110, the display controller 140, the system memory 112, the CODEC 144, the wireless interface 150, and the camera interface 134 are included in a system-in-package or system-on-chip device 104.
- an input device 156 and a power supply 158 are coupled to the system-on-chip device 104.
- the display device 142, the input device 156, the speaker 146, the microphone 148, the wireless antenna 152, the video camera 136, and the power supply 158 are external to the system-on-chip device 104.
- each of the display device 142, the input device 156, the speaker 146, the microphone 148, the wireless antenna 152, the video camera 136, and the power supply 158 can be coupled to a component of the system-on-chip device 104, such as an interface or a controller.
- the device 100 in accordance with embodiments described herein may be incorporated in a variety of electronic devices, such as a set top box, an entertainment unit, a navigation device, a communications device, a personal digital assistant (PDA), a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a computer, a portable computer, tablets, a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a video player, a digital video player, a digital video disc (DVD) player, a portable digital video player, any other device that stores or retrieves data or computer instructions, or any combination thereof.
- PDA personal digital assistant
- FIG. 2 is an illustration of a processor complex 200 having a processor 210 which executes instructions of different lengths fetched from a memory hierarchy 204.
- the memory hierarchy 204 includes a level 1 (LI) data cache 214, a memory management unit (MMU) 220 comprising an instruction translation lookaside buffer (ITLB) 217, an LI instruction cache (Icache) 218, an external tag (xTag) circuit 219, a cache controller circuit 221, a write control circuit 222, a level 2 instruction and data cache (L2 cache) 226, and system memory 227.
- LI level 1
- MMU memory management unit
- ITLB instruction translation lookaside buffer
- Icache LI instruction cache
- xTag external tag
- L2 cache level 2 instruction and data cache
- the xTag circuit 219 is external to the Icache 218, which it is associated with, allowing functions of the xTag circuit 219 to be added to the processor complex 200 without modification of storage arrays in the Icache 218.
- the processor complex 200 may be suitably employed in hardware components of device 100 of FIG. 1 for executing program code. Peripheral devices which may connect to the processor complex are not shown for clarity of discussion of the present invention.
- the various components of the processing complex 200 may be
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the processor 210 retrieves instructions and data from the caches in a hierarchical fashion. For example, the processor 210 fetches an instruction by generating a fetch output 228 comprising a virtual fetch address and an operating mode.
- the operating mode may include an identification of a 32-bit instruction only mode, a 16-bit instruction only mode, a mixed 16-bit instruction and 32-bit instruction mode, other operating modes, and the like.
- Such a processor operating mode state indicator is controlled by a program in operation on the processor.
- the processor's instruction set includes instructions encoded in multiple length formats, where longer instructions are conventionally a multiple of the shortest instruction format length available in the variable length instruction set.
- the processor may include a separate instruction alignment pipeline stage and split the decode operation into a predecode operation and a decode pipeline stage.
- the predecode operation may be suitably hidden from normal pipeline execution by providing the predecode operation during LI Icache miss processing. LI Icache miss processing occurs when the fetched instruction is not found in the LI Icache and must be fetched from higher levels of the memory hierarchy.
- the predecode operation stores predecode information along with the fetched instructions in the LI instruction cache. Such predecode operations and operations of the xTag circuit 219 are controlled by the write control circuit 222.
- the processor 210 In operation, the processor 210 generates a virtual address which is translated by the ITLB 217 to a physical fetch address that is used to access the LI Icache 218 to determine if an addressed instruction is present in the LI Icache by use of a match mechanism. If no match is found for the addressed instruction in the LI Icache 218, a miss occurs. Miss information 230 is sent to the write control circuit 222 which may also include a predecoder, and the processor 210 makes an access request 232 to the L2 cache 226. With an instruction hit in the L2 cache 226, an L2 cache line containing the desired instruction is output on a first port (portA) 234 to the write control circuit 222.
- portA port
- the write control circuit 222 during miss processing, partially decodes the instructions fetched from the L2 cache and provides instructions, predecoded bits associated with the instructions, and tag information such as execute permission bits on output 238 to the LI Icache 218 with the instruction also passed to the processor 210.
- the processor 210 accesses LI data cache 214 to determine if the addressed data is present. If no match is found for the fetched data in the LI data cache 214, a miss occurs and the L2 cache 226 is accessed next. In both LI cache cases, if the instruction or data is found to be present in the LI instruction or LI data cache (referred to as hitting in the cache), then the instruction and data are read directly from their respective LI cache on outputs 240 and 244. If a miss occurs for the L2 cache access, the instruction and data are provided by from the system memory 227.
- Fig. 3A illustrates an exemplary program segment 300 that may suitably contain varying length instructions of 16 and 32 bits.
- the exemplary program segment 300 may suitably be stored in the memory hierarchy 204 of the processor complex 200. It should be noted that although for illustration purposes the program segment is assumed to be retrieved from one or more cache lines, the teachings of the invention are applicable to any memory device storing the program segment where an instruction may span a storage segment boundary. Since a cache line may have a fixed length, a program segment may span the boundary of a cache line and thus may have instructions which are split across the cache line boundary.
- the program segment 300 includes instructions 302 which come from a variable length instruction set consisting of 16-bit and 32-bit instructions.
- processor 210 may use 16-bit and 32-bit instruction formats for multiple types of instructions and may support several modes of operation that specify and restrict instruction type usage.
- processor 210 may have a first mode of operation that specifies only 32-bit instructions may be used and a second mode of operation that specifies that a combination of 16-bit and 32-bit instructions may be used.
- processors may have multiple modes of operation, for the purposes of clarity of discussion of the present invention, the description of the exemplary processor 210 is primarily limited to the second mode of operation described above.
- program relative byte indicators 304 represent the byte location in a cache line where an instruction begins and indirectly indicate the size of the instruction.
- the ADD R5, R4, R3 instruction 306 begins at relative byte position 00 and ends at byte position 01.
- ADD R5, R4, R3 instruction 306 is a 16-bit instruction.
- ADD instruction 309 is also 16 bits long.
- the load (LOAD) instruction 307, the LOAD instruction 308, and the store (STORE) instruction 310 are 32-bits long.
- a cache line size may vary in different processor implementations depending, for example, upon choices made in the design of the processor and memory hierarchy based on the fabrication technology used.
- the L2 cache 226 may use a 512- bit cache line and the LI Icache 218 may use a smaller cache line, such as a 128-bit or a 256-bit cache line, for example.
- the indicated cache line size is exemplary and larger or smaller cache line sizes are not precluded. It is also noted that for illustrative purposes, the program segment 300 has been shown starting at the relative address 00. It will be appreciated, that such a program segment 300 may be located beginning at various points in a cache line and may span multiple cache lines.
- Fig. 3B illustrates exemplary LI Icache lines 320 containing instructions from the program segment 300 of Fig. 3A.
- An exemplary first LI Icache line 322 and an exemplary second LI Icache line 326 are adjacent cache lines in the LI Icache 218 of FIG. 2.
- the first LI Icache line 322 comprises 16-bit fields 330, 333, 334, and 336 and a 16-bit extension field 338.
- the first LI Icache line 322 is associated with a tag field 323 and control flags Cn 324 which may include a cacheable indicator (L) and execute permission bits, such as a user execute (Ux) bit and a privilege execute (Px) bit, for example.
- L cacheable indicator
- Px privilege execute
- the second LI Icache line 326 comprises 16-bit fields 340, 342, 343, and 344, and a 16-bit extension field 346.
- the second LI Icache line 326 is associated with a tag field 327 and control flags Cn 328 which may include a cacheable indicator (L) and execute permission bits, such as a user execute (Ux) and a privilege execute (Px) bit associated with instructions stored in the second LI Icache line 326.
- the instructions of program segment 300 of Fig. 3A may be located in the first LI Icache line 322 beginning with the 16-bit ADD R5, R4, R3 instruction 306 of FIG. 3A stored in the 16-bit field 330.
- the 32-bit LOAD instruction 307 is stored in a 32-bit field 332 comprising the two 16-bit fields 333 and 334.
- the 16-bit field 333 contains the high order 16-bits of the LOAD instruction 307 and the adjacent 16-bit field 334 contains the low order 16-bits of the LOAD instruction 307.
- the next instruction in the first LI Icache line 322 is the 32-bit LOAD instruction 308 which is stored across two instruction cache lines.
- the high order 16- bits of the LOAD instruction 308 are stored in the 16-bit field 336 in the first LI Icache line 322.
- the low order 16-bits of the LOAD instruction 308 are stored in the 16-bit field 340 in the second LI Icache line 326.
- a copy of the low order 16-bits of the LOAD instruction 308 is stored in the 16-bit extension field 338.
- the ADD R8, R6, R7 instruction 309 and the STORE instruction 310, both of FIG. 3A are stored in the 16-bit fields 342-344 in the second LI Icache line 326 in similar fashion to segments 330 and 332 of the first LI Icache line 322. It is also noted that predecode bits, not shown for clarity in the present description, may be associated with each 16-bit field in the cache line.
- an instruction cache in a processor complex supporting 16-bit and 32-bit instructions may be constructed having cache lines, that may store, for example, N K-bit format aligned instructions plus one K/2-format instruction. It is noted that Fig. 3B is exemplary, and K-bit instructions may be stored on 8-bit byte address boundaries and 16-bit half-word boundaries. Also, it is further noted that instruction set architectures having instruction formats which are not a multiple of each other, such as 16-bit and 24-bit instructions, are also supported by embodiments of the present invention.
- a cache line with a mix of 16-bit and 32-bit instruction may have a cache line crossing 32-bit instruction which would be stored in the last 32-bit location of a cache line making use of the extra K/2-bit space, such as the first cache line 322 with the 16-bit extension field 338.
- the low order 16-bit portion of the 32-bit cache line crossing instruction stored in the last 16-bits cache extension field 338 is a duplicate of the 16-bit portion stored in the next sequential cache line in bit field 340.
- a processor having instructions that are a multiple of 8-bits may also have line crossing instructions.
- the line crossing instruction may be split at a byte boundary with a one byte portion, a two byte portion, or a three byte portion of the instruction, for example, continuing on in the second cache line.
- the one byte portion, the two byte portion, or the three byte portion stored in the second cache line is copied and stored in a position associated with the first part of the line crossing instruction in the first cache line.
- a three byte extension to the cache line is provided.
- the cache extension field 338 would be expanded to a three byte bit field instead of its presently illustrated 16-bits.
- Other byte length instructions are possible and not precluded by this invention. Since a cache line crossing instruction may also cross a page boundary into a non-cacheable page and thus may not be cacheable, the page boundary (line/page) crossing non-cacheable instruction must be prevented from executing from the cache.
- extension field 338 could be expanded to store more than a portion of a single instruction, such as storing a first portion of a single line crossing instruction and also storing a second instruction which would generally be associated with the next logical page stored with the cache line that is making use of the expanded extension field.
- FIG. 4A illustrates a paged virtual memory system 400 having an instruction translation look aside buffer (ITLB) 402 and a physical memory 404 in accordance with an embodiment of the invention.
- a virtual address 405 is generally encoded in two parts.
- An upper field of address bits usually represent a virtual page number 406 that is encoded based on a selected page size, such as 4k byte pages.
- a lower field of address bits is a page offset 407 that identifies an address within the addressed page.
- the virtual page number is translated to a physical page number (P-page address).
- the page offset is the same for both the virtual address and the physical address and is not translated.
- a virtual to physical address translation system may include one or more translation look aside buffers (TLBs) associated with the various caches, such as level 1 and level 2 instruction and data caches, to improve performance of the translation process.
- TLB translation look aside buffer
- An instruction TLB (ITLB) is a small cache that stores recent virtual to physical address translations along with attributes of the stored pages, such as entry validation and whether the page contains cacheable or non-cacheable instructions.
- the ITLB conventionally includes a content addressable memory (CAM) circuit coupled with a random access memory (RAM) circuit and is relatively small, such as having 32 or 64 entries. Each ITLB entry includes a tag in the CAM circuit having a recently used virtual page number associated with a translated physical page number in the RAM circuit.
- the paged virtual memory system 400 uses an ITLB 402 and a physical memory 404 having cacheable pages 408 and 410 intermixed with one or more non-cacheable page, such as non-cacheable page 409.
- Each entry of the ITLB 402 has flags 412 comprising a valid (V) flag, a read (R) flag, a write (W) flag, and a cacheable indicator (L) 414, a virtual address tag 416, and an associated physical page address 418.
- the L field 416 may be a single bit appropriate for identifying a page as cacheable or non-cacheable. Whether a page is cacheable or non-cacheable may be determined statically during compilation and might depend on a variety of factors. For instance, if memory mapped input and output (I/O) devices are used in an actual implementation of a system, such memory mapped locations may be tagged as non-cacheable.
- I/O memory mapped input and output
- FIG. 4B illustrates a virtual to physical address translation subsystem 440 in accordance with an embodiment of the invention.
- the translation subsystem 440 is comprised of a processor pipeline 442, an ITLB 444, a physical address buffer 446, an xTag circuit 447, an LI Icache 448, an L2 cache circuit 450, a system memory 452, and a write control circuit 454.
- the ITLB 444 has an entry 456 with a cacheable indicator (L) 458 in the ITLB tags.
- the LI Icache 448 comprises a tag field 470 associated with each line, a base extent 471 for storage of cached instructions, and an extension field 472 associated with each line.
- an exemplary first line 457 in the LI Icache 448 comprises a set of cacheable instructions stored in a first base extent 473 of the first line 457.
- the non-cacheable portion of the instruction dictates that the 32-bit instruction be treated as a non-cacheable instruction.
- the first 16-bit portion (la) 474 is a cacheable portion having been fetched from a cacheable page
- the second 16-bit portion (lb) 475 is a non-cacheable portion having been fetched from a non-cacheable page.
- the first portion la 474 is stored in the first base extent 473 of the first line 457 and the second portion lb 475 is stored in an extension field associated with the first line 457.
- the first line 457 also has an associated tag selected from the tag field 470 comprising one or more execute permission bits associated with the cacheable instructions stored in the first base extent line 473.
- the LI Icache 448 also includes an exemplary second line 459 comprising storage space for at least a second portion copy of lb (lb') 476. If the first portion la 474 and the second portion lb 475 represented a cacheable instruction, the second portion copy lb' 476 would be stored in the second line 459. In such a case with a cacheable line crossing instruction and depending on decisions made during an implementation, the second portion lb 475 and the second portion copy lb' 476 may switch positions since the contents of both portions is the same. However, in the exemplary case where the first portion la 474 and second portion lb 475 are portions of a non-cacheable instruction, a second portion copy lb' 476 is not stored in the second line 459.
- a translation process begins by applying a virtual page number 406 selected from a virtual address 405 to a CAM circuit in the ITLB 444.
- the ITLB 444 does a parallel comparison of the applied virtual page number 406 generally with all of the stored recently used virtual page numbers stored with the entry tags in the CAM tags 460.
- a translated physical address 463 comprises the translated physical page address 462 concatenated with the page offset 464 from the virtual address 405.
- a virtual address 405 is comprised of a virtual page number 406 having bits [31: 12] and a page offset 407 having bits [11:0].
- the memory hierarchy of caches and main memory may encompass a physical memory space of 512k bytes and 4k byte pages.
- the virtual address 405 is translated to a physical address 463.
- the physical address 463 is comprised of a physical page number 462 having bits [28: 12], of which bits [18: 12] are required for the 512k byte implementation, and a page offset 464 having bits [11:0].
- tags including the cacheable indicator (L) 458 are also output and stored in the physical address buffer 446.
- the placement of the cacheable indicator (L) 458 and the tags 465 is exemplary.
- the physical address 463 is then applied to the LI Icache 448.
- attributes associated with the extra K/2-bit field line crossing instruction data may be specified with a control attribute that is stored and tracked separately from the attributes of the rest of the instructions in the cache line.
- the control attribute in this exemplary case of having a non-cacheable instruction stored in a cache line that is also a cache line crossing instruction would be set to indicate do not execute the non-cacheable instruction in any mode.
- the control attribute would be stored in at least one storage bit that is associated with the cache line having the line/page crossing instruction.
- a non-cacheable flag When the portion of the line/page crossing instruction is fetched from the cache as part of a fetch group, a non-cacheable flag would be asserted in the xTag circuit 447.
- An xTag circuit such as the xTag circuit 447, is implemented for each cache line that may contain a page crossing instruction. Also, the xTag circuit 447 is accessed for flag data that is forwarded to the processor pipeline 442 which may generally occur only when that set of fetched cache line instructions contains a line crossing instruction. Also, it is noted that permission bits associated with the cacheable instructions in the fetch group are also retrieved.
- the line/page crossing instruction or portion thereof having the control attribute may override the permission bits associated with the fetch group just for the line/page crossing instruction in order to not allow the line/page crossing instruction to execute in any mode.
- Such operation may be controlled by the noncacheable flag in the xTag circuit 447.
- the operation may also be controlled by providing xTag external permission bits (xPbits) 449 for just this line/page crossing instruction which are stored in the xTag circuit 447 and which overrides the cache line permission bits just for the line/page crossing instruction.
- the permission bits for the cacheable instructions accessed from the associated tag field 470, the line/page crossing instruction or portion thereof from the extension field 472, such as the second portion lb 475, and the xPbits 449, for example accessed on xTag 480, for the line/page crossing instruction from the xTag circuit 447 are forwarded to the processor pipeline 442.
- the processor pipeline 442 includes a detect (Dt) circuit 482, a first decode (Dc) circuit 483, a buffer and hold (B&H) circuit 484, a refetch circuit 485, a recombine circuit 486, and a multiplexer 487.
- the Dt circuit 482 detects that the second portion lb 475 and the accessed xTag 480 have been received, generally in the pipeline stage that checks if execute permission is allowed and tags the second portion lb 475 as do not execute.
- the Dc circuit 483 identifies whether the second portion lb 475 is part of a page crossing instruction.
- the Dc circuit 483 decodes the data and determines, in this exemplary case, that the second portion lb 475 is part of a page crossing instruction.
- the processor pipeline 442 operation continues with the B&H circuit 484 which buffers instructions it has received from the cache line and determines whether the second portion lb 475 represents the oldest instruction in the fetch group. If the B&H circuit 484 determines the second portion lb 475 does not represent the oldest instruction in the fetch group, the B&H circuit 484 buffers the second portion lb 475 and holds it until it has been determined to represent the oldest instruction. At the time it is determined that the second portion lb 475 represents the oldest instruction in the processor pipeline 442, a flush of the processor pipeline above the second portion lb 475 is executed.
- the non-cacheable instruction is refetched from system memory 452 which reuses an existing dataflow associated with resolving a permission fault problem.
- the second portion lb 475 may also be flushed or may be allowed to be overwritten. ⁇ 0050 ⁇
- the flush of the good cacheable data in the cache line may not be necessary and the refetch circuit 485 refetches the second portion lb 475 that has the non-cacheable attribute, bypassing the instruction cache, and obtaining the second portion lb 475 directly from system memory 452, through multiplexor 477, for example.
- the recombine circuit 486 combines the first portion la 474 with the second portion lb 475 received from the system memory 452 to form a complete instruction, Ialllb, and passes the instruction through the multiplexer 487 to be decoded and continue pipeline processing allowing the combined instruction to execute without having been fetched from the instruction cache. It is noted that any necessary predecode and decoding operations on the combined instruction may need to be repeated following proper pipeline protocol for execution. It is also noted that the Dt circuit 482 may be associated with a fetch pipeline stage, the Dc circuit 483 associated with a general decode pipeline stage, and the B&H circuit 484 associated with an instruction queue. The exemplary circuitry 482-487 may be placed in appropriate pipeline stages according to a particular implementation.
- the processor pipeline stalls issuing the line/page crossing instruction and instructions following the line/page crossing instruction until a determination can be made whether the line/page crossing instruction has been reached. If the line/page crossing instruction is not reached, such as due to execution of a branch instruction, standard branch operations are followed. In one embodiment, if the line/page crossing instruction is reached, the line/page crossing instruction and instructions following the line/page crossing instruction are flushed and a non-cacheable request 235 is made to the system memory 227, bypassing the LI Icache 218, for at least the line/page crossing instruction that was identified as non-cacheable.
- the non-cacheable instruction is returned on a system memory output bus 236 of FIG. 2, for example.
- at least that portion of the line/page crossing instruction that was duplicated in the previous cache line is refetched, the whole line/page crossing instruction is
- the line/page crossing instruction or portion thereof is returned from system memory with the proper attribute for a non-cached fetched instruction and the reconstructed instruction can be executed without being cached.
- a fixed length instruction set architecture could have unaligned instructions due, for example, to use of a Von Neumann architecture with data of varying data widths stored with the fixed length instructions.
- the combination of fixed length instructions with data of mixed widths could lead to the same problem and solution for any unaligned instruction that crosses a cache line and also crosses a page boundary between a cacheable page and a non-cacheable page.
- processor performance for executing the majority of instructions in the cache line with the single line/page crossing instruction that is not cacheable remains the same as execution of instructions fetched from any cache line not having such a line/page crossing instruction.
- the execute permission bits are stored in a tag associated with each line in the LI Icache 448 and are valid for each cacheable instruction stored in the base extent 471.
- the non-cacheable indicator flag may be stored in additional permission bits associated with the second portion lb 475 that is stored in an extension field associated with the first line 457.
- the additional permission bits are stored external to the LI Icache 448 in the xTag circuit 447, for example, and indicate do not execute for any reason.
- a cache line is chosen to have a fixed number of 16-bit instructions or a fixed number of 32-bit instructions.
- a 4k byte page corresponds to 64 cache lines, which may be numbered 0 to 63.
- the 16-bit extension field may be stored in a separate array. Since only a line at set 63 may have a page crossing instruction, a fetch address is compared with the end of page address to determine whether to use the additional permission bits. The fetch address is also compared to determine if the addressed instruction is split across a cache line to identify it as a line crossing instruction.
- the permission bits are generally written in a tag field associated with an accessed line.
- an addressed cache line is not valid, as indicated, for example, by a valid flag in the Icache tags 470, the fetch is directed to the L2 cache 226 or to system memory 227.
- a fetch request speculatively returns a plurality of instructions for loading into the LI Icache 218 and from which the requested instruction is returned to the processor 210.
- the cacheability attribute of the requested address such as the L bit 414 of FIG. 4A.
- the L bit 414 is then generally associated with the fetch group and resolved with other flags into the permission bits that are loaded into the tag associated with the fetch group in the cache line.
- the fetch group is not loaded into the LI Icache 448.
- the first 16-bit portion of the instruction may be accessed from a cacheable page while the second 16-bit portion may accessed from a non-cacheable page. Since generally, two lines are fetched on a miss in a first level instruction cache, the cacheability of the second line may also be
- the second 16-bit portion of the page and line crossing instruction may be loaded in the extension field associated with the rest of the cache line storing the fetch group and the extra permission bits (xPbits) 449 may be stored in the xTag circuit 447.
- the extension field may be expanded to store more than 16-bits accommodating, for example 32-bits or 48-bits, storing an additional 16-bit or 32-bit instruction.
- predecode bits are associated with each instruction in a cache line, the extension field might be expanded to include 2 or 4 predecode bits per 16-bit portion stored.
- the extra data permission information may be identified from encodings of the pre-decode bits.
- the extra data permission information may be stored in any storage field that can be associated with the page crossing instruction.
- the page crossing instruction may also be identified in one or more extra predecode bits instead of identifying it based on a size and address calculation.
- the indication to "not execute for any reason" may be stored in the predecode bits to identify the page crossing instruction as a faulty instruction for the case of non-cacheable data stored in the instruction cache.
- FIG. 5 illustrates an exemplary two way set associative Icache circuit 500 having a line crossing instruction and a supporting line crossing indicator in accordance with an embodiment of the invention. While the invention is applicable to other cache designs such as a direct mapped cache, a four way, an eight way, up to and including fully associative caches, a two way set associative instruction cache is shown as an exemplary instruction cache circuit 500.
- a first way 514 includes a first way tag bit field 518, including permission bits and a cache line address tag for each line in the first way.
- the first way 514 further includes lines of data 519 shown with "n" instructions, IcO-Icn and a first portion Ixa, for example, and an extension field 520 shown as storing a second portion Ixb.
- the instruction cache circuit 500 also comprises storage for a second way 516 having a second way tag bit field 522, including permission bits and a cache line address tag for each line in the second way.
- the second way 516 further includes lines of data 523 shown with "z" instructions, IbO-Ibz, and an extension field 524 shown as not occupied. In general, the storage capacity of the lines in each way would be the same though capable of storing a different number of instructions of different length.
- each line may have a line and page crossing instruction, such as the instruction made up of Ixa and Ixb.
- the extra permission bits are stored separately in the xTag circuits 532 and 533 and are tracked separately.
- every cache line could have a line and page crossing instruction and the extra permission bits could then be included in the tag bit fields 518 and 522 in order to track the page crossing instructions in each line in the cache.
- extension fields 520 and 524 are shown directly associated with their corresponding line array, the extension fields 520 and 524 may be implemented in arrays separate from the line arrays.
- FIG. 6 illustrates a process 600 for managing page crossing instructions with different cacheability in accordance with an embodiment of the invention.
- a cache line is established with extended storage for a cache line crossing instruction and an attribute flag for the extended storage.
- the attribute flag may be stored external to the cache.
- a page crossing instruction is fetched, for example in a fetch group of instructions, including a second portion of the page crossing instructions from the extended storage.
- the attribute flag is captured from an xTag circuit to track the attribute flag with the second portion of the page crossing instruction through the processor pipeline.
- the second portion of the page crossing instruction is detected to have been received in the processor pipeline, that the page crossing instruction originated from the instruction cache, and that it is tagged as not executable in any mode.
- the page crossing instruction is decoded to identify it to the processor pipeline and tag it internal to the processor pipeline as do not execute.
- a flush is not executed and only the page crossing instruction or the second portion of the page crossing instruction that has the non-cacheable attribute is fetched directly from system memory.
- the page crossing instruction is refetched or at least the second portion of the page crossing instruction is refetched from system memory bypassing the instruction cache. If the second portion is refetched, the first cacheable portion of the page crossing instruction is reserved for an operation to reconstruct the non-cacheable instruction.
- the page crossing instruction is reconstructed if required by combining the cacheable first portion with the second portion that was refetched from system memory and executed as non-cacheable.
- the present invention is not limited to the illustrated instruction flow logic 200 and is further applicable to any pipeline processor having variable length instructions which may also store predecode information in an instruction cache.
- Extensions to a variable length processor instruction set may be accommodated by the present invention if the extension supports a unique mode of instruction set use.
- a mode of operation may be specified where 16-bit, 32-bit, and 64-bit instructions are operative, such that 32-bit and 64-bit instructions may span across two LI Icache lines.
- the processor using 64-bit instruction types may be an extension of the exemplary processor 204 described above.
- the extended processor could have operating mode states encoded for example for a first state restricted to only 32-bit instructions, a second state for both 16-bit and 32-bit instructions, a third state for 16-bit, 32-bit, and 64-bit instructions, and a fourth state restricted to only 64-bit instructions.
- a 64-bit instruction in an Icache line could be partitioned into four 16-bit fields.
- An extension bit field may be used having 48-bits to allow a 64-bit instruction to be split across four 16-bit portions in a line and page crossing situation.
- the present invention is also not limited to instruction lengths that are power of two.
- an instruction cache line may be partitioned into 8-bit instruction sections.
- a 24-bit instruction could consist of three 8-bits sections, for example.
- a 192-bit base extent cache line storing 16-bit instructions would be able to hold twelve 16-bit instructions and eight 24-bit instructions.
- a 16-bit extension field would allow the 24-bit instructions to be split into three 8-bit portions.
- An embodiment also addresses an alternative cache that may be configured with an extension data storage portion, such as the extension field 472 of FIG. 4B, structured at the beginning of the cache lines.
- an embodiment addresses a first instruction having a first half that is non-cacheable and a second half that is cacheable and the rest of the cache line having data that is cacheable.
- the procedures for handling the non-cacheable portion of the first instruction in this alternative cache operate in a manner similar to the procedures for handling the non-cacheable portion of the last instruction for the cache shown in FIG. 4B, as described herein.
- the methods described in connection with the embodiments disclosed herein may be embodied in hardware and used by software from a memory module that stores non-transitory signals executed by a processor.
- the software may support execution of the hardware as described herein or may be used to emulate the methods and apparatus for managing page crossing instructions with different cacheability.
- the software module may reside in random access memory (RAM), flash memory, read only memory (ROM), electrically programmable read only memory (EPROM), hard disk, a removable disk, tape, compact disk read only memory (CD-ROM), or any other form of non-transient storage medium known in the art.
- a storage medium may be coupled to the processor such that the processor can read information from, and in some cases write information to, the storage medium.
- the storage medium coupling to the processor may be a direct coupling integral to a circuit implementation or may utilize one or more interfaces, supporting direct accesses or data streaming using down loading techniques.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201380047990.8A CN104662520B (zh) | 2012-09-26 | 2013-09-26 | 用于管理具有不同高速缓存能力的跨页指令的方法和设备 |
| JP2015533311A JP6196310B2 (ja) | 2012-09-26 | 2013-09-26 | 異なるキャッシュ可能性を用いてページ横断命令を管理するための方法および装置 |
| EP13776635.8A EP2901288B1 (en) | 2012-09-26 | 2013-09-26 | Methods and apparatus for managing page crossing instructions with different cacheability |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/626,916 | 2012-09-26 | ||
| US13/626,916 US8819342B2 (en) | 2012-09-26 | 2012-09-26 | Methods and apparatus for managing page crossing instructions with different cacheability |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014052561A1 true WO2014052561A1 (en) | 2014-04-03 |
Family
ID=49354919
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2013/061876 Ceased WO2014052561A1 (en) | 2012-09-26 | 2013-09-26 | Methods and apparatus for managing page crossing instructions with different cacheability |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US8819342B2 (https=) |
| EP (1) | EP2901288B1 (https=) |
| JP (1) | JP6196310B2 (https=) |
| CN (1) | CN104662520B (https=) |
| WO (1) | WO2014052561A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20180008625A (ko) * | 2015-05-14 | 2018-01-24 | 자일링크스 인코포레이티드 | 프로그램가능한 집적 회로에서의 메모리 리소스들의 관리 |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9460018B2 (en) | 2012-05-09 | 2016-10-04 | Qualcomm Incorporated | Method and apparatus for tracking extra data permissions in an instruction cache |
| US9348598B2 (en) * | 2013-04-23 | 2016-05-24 | Arm Limited | Data processing apparatus and method for pre-decoding instructions to be executed by processing circuitry |
| US10318427B2 (en) * | 2014-12-18 | 2019-06-11 | Intel Corporation | Resolving memory accesses crossing cache line boundaries |
| US10176096B2 (en) * | 2016-02-22 | 2019-01-08 | Qualcomm Incorporated | Providing scalable dynamic random access memory (DRAM) cache management using DRAM cache indicator caches |
| US20170255569A1 (en) * | 2016-03-01 | 2017-09-07 | Qualcomm Incorporated | Write-allocation for a cache based on execute permissions |
| CN105786717B (zh) * | 2016-03-22 | 2018-11-16 | 华中科技大学 | 软硬件协同管理的dram-nvm层次化异构内存访问方法及系统 |
| US10204053B2 (en) * | 2016-09-30 | 2019-02-12 | Oracle International Corporation | Modeling processor shared memory using a cacheability status |
| US11726783B2 (en) * | 2020-04-23 | 2023-08-15 | Advanced Micro Devices, Inc. | Filtering micro-operations for a micro-operation cache in a processor |
| US11740973B2 (en) * | 2020-11-23 | 2023-08-29 | Cadence Design Systems, Inc. | Instruction error handling |
| CN112631490A (zh) * | 2020-12-30 | 2021-04-09 | 北京飞讯数码科技有限公司 | 显示界面控制方法、装置、计算机设备及存储介质 |
| US11995010B2 (en) * | 2021-04-16 | 2024-05-28 | Avago Technologies International Sales Pte. Limited | Adaptor storage system of and method |
| CN118796278B (zh) * | 2024-09-14 | 2024-11-29 | 北京微核芯科技有限公司 | 处理器取指令方法、装置、设备和存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060265572A1 (en) * | 2005-05-18 | 2006-11-23 | Stempel Brian M | Handling cache miss in an instruction crossing a cache line boundary |
| US20070255905A1 (en) * | 2006-05-01 | 2007-11-01 | Morrow Michael W | Method and Apparatus for Caching Variable Length Instructions |
| US7330959B1 (en) * | 2004-04-23 | 2008-02-12 | Transmeta Corporation | Use of MTRR and page attribute table to support multiple byte order formats in a computer system |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6020255A (ja) * | 1983-07-15 | 1985-02-01 | Fujitsu Ltd | バツフア記憶制御方式 |
| JPH0254351A (ja) * | 1988-08-18 | 1990-02-23 | Mitsubishi Electric Corp | データ処理システム |
| EP1224539A1 (en) | 1999-10-14 | 2002-07-24 | Advanced Micro Devices, Inc. | Apparatus and method for caching alignment information |
| US20040103251A1 (en) * | 2002-11-26 | 2004-05-27 | Mitchell Alsup | Microprocessor including a first level cache and a second level cache having different cache line sizes |
| US7406613B2 (en) | 2004-12-02 | 2008-07-29 | Qualcomm Incorporated | Translation lookaside buffer (TLB) suppression for intra-page program counter relative or absolute address branch instructions |
| JP4837305B2 (ja) * | 2005-05-10 | 2011-12-14 | ルネサスエレクトロニクス株式会社 | マイクロプロセッサ及びマイクロプロセッサの制御方法 |
| US8117404B2 (en) | 2005-08-10 | 2012-02-14 | Apple Inc. | Misalignment predictor |
| US20080120468A1 (en) * | 2006-11-21 | 2008-05-22 | Davis Gordon T | Instruction Cache Trace Formation |
| US8239657B2 (en) | 2007-02-07 | 2012-08-07 | Qualcomm Incorporated | Address translation method and apparatus |
| US8898437B2 (en) | 2007-11-02 | 2014-11-25 | Qualcomm Incorporated | Predecode repair cache for instructions that cross an instruction cache line |
| US8347067B2 (en) * | 2008-01-23 | 2013-01-01 | Arm Limited | Instruction pre-decoding of multiple instruction sets |
| US8140768B2 (en) | 2008-02-01 | 2012-03-20 | International Business Machines Corporation | Jump starting prefetch streams across page boundaries |
| US8639943B2 (en) | 2008-06-16 | 2014-01-28 | Qualcomm Incorporated | Methods and systems for checking run-time integrity of secure code cross-reference to related applications |
| US8560811B2 (en) | 2010-08-05 | 2013-10-15 | Advanced Micro Devices, Inc. | Lane crossing instruction selecting operand data bits conveyed from register via direct path and lane crossing path for execution |
| US9460018B2 (en) | 2012-05-09 | 2016-10-04 | Qualcomm Incorporated | Method and apparatus for tracking extra data permissions in an instruction cache |
-
2012
- 2012-09-26 US US13/626,916 patent/US8819342B2/en not_active Expired - Fee Related
-
2013
- 2013-09-26 CN CN201380047990.8A patent/CN104662520B/zh not_active Expired - Fee Related
- 2013-09-26 WO PCT/US2013/061876 patent/WO2014052561A1/en not_active Ceased
- 2013-09-26 EP EP13776635.8A patent/EP2901288B1/en not_active Not-in-force
- 2013-09-26 JP JP2015533311A patent/JP6196310B2/ja not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7330959B1 (en) * | 2004-04-23 | 2008-02-12 | Transmeta Corporation | Use of MTRR and page attribute table to support multiple byte order formats in a computer system |
| US20060265572A1 (en) * | 2005-05-18 | 2006-11-23 | Stempel Brian M | Handling cache miss in an instruction crossing a cache line boundary |
| US20070255905A1 (en) * | 2006-05-01 | 2007-11-01 | Morrow Michael W | Method and Apparatus for Caching Variable Length Instructions |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20180008625A (ko) * | 2015-05-14 | 2018-01-24 | 자일링크스 인코포레이티드 | 프로그램가능한 집적 회로에서의 메모리 리소스들의 관리 |
| JP2018519571A (ja) * | 2015-05-14 | 2018-07-19 | ザイリンクス インコーポレイテッドXilinx Incorporated | プログラマブル集積回路におけるメモリリソースの管理 |
| KR102534161B1 (ko) | 2015-05-14 | 2023-05-17 | 자일링크스 인코포레이티드 | 프로그램가능한 집적 회로에서의 메모리 리소스들의 관리 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104662520A (zh) | 2015-05-27 |
| US8819342B2 (en) | 2014-08-26 |
| JP6196310B2 (ja) | 2017-09-13 |
| JP2015534687A (ja) | 2015-12-03 |
| EP2901288A1 (en) | 2015-08-05 |
| EP2901288B1 (en) | 2018-08-08 |
| US20140089598A1 (en) | 2014-03-27 |
| CN104662520B (zh) | 2018-05-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8819342B2 (en) | Methods and apparatus for managing page crossing instructions with different cacheability | |
| US6678815B1 (en) | Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end | |
| US6240484B1 (en) | Linearly addressable microprocessor cache | |
| US5319760A (en) | Translation buffer for virtual machines with address space match | |
| US8145883B2 (en) | Preloading instructions from an instruction set other than a currently executing instruction set | |
| KR101839479B1 (ko) | 더 넓은 레지스터에의 모드 의존형 부분 폭 로드 프로세서들, 방법들, 및 시스템들 | |
| US9396117B2 (en) | Instruction cache power reduction | |
| US20260086946A1 (en) | Servicing cpu demand requests with inflight prefetches | |
| US9244837B2 (en) | Zero cycle clock invalidate operation | |
| WO2013106583A1 (en) | Non-allocating memory access with physical address | |
| KR101898322B1 (ko) | 상이한 인덱싱 방식을 사용하는 1차 캐시와 오버플로 캐시를 갖는 캐시 시스템 | |
| JPH07200399A (ja) | マイクロプロセッサ、およびマイクロプロセッサにおいてメモリにアクセスするための方法 | |
| US10929296B2 (en) | Zero latency prefetching in caches | |
| KR101787851B1 (ko) | 다중 페이지 크기 변환 색인 버퍼(tlb)용 장치 및 방법 | |
| US8977821B2 (en) | Parallel processing of multiple block coherence operations | |
| US7769983B2 (en) | Caching instructions for a multiple-state processor | |
| US6385696B1 (en) | Embedded cache with way size bigger than page size | |
| KR100204024B1 (ko) | 페이지 경계에 걸리는 분할 라인을 한 사이클에 억세스할 수 있는 컴퓨팅 시스템 | |
| KR100942408B1 (ko) | 가변 길이 명령어에 대한 절전 방법 및 장치 | |
| US20040181626A1 (en) | Partial linearly tagged cache memory system | |
| EP4202692B1 (en) | Apparatus and method for constant detection during compress operations | |
| WO2013086060A1 (en) | Selective access of a store buffer based on cache state | |
| US20060195677A1 (en) | Bank conflict avoidance in a multi-banked cache system | |
| US20130318307A1 (en) | Memory mapped fetch-ahead control for data cache accesses | |
| US11762772B2 (en) | Data processing apparatus and data accessing circuit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13776635 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2013776635 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2015533311 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |