US12248399B2 - Multi-block cache fetch techniques - Google Patents
Multi-block cache fetch techniques Download PDFInfo
- Publication number
- US12248399B2 US12248399B2 US17/324,800 US202117324800A US12248399B2 US 12248399 B2 US12248399 B2 US 12248399B2 US 202117324800 A US202117324800 A US 202117324800A US 12248399 B2 US12248399 B2 US 12248399B2
- Authority
- US
- United States
- Prior art keywords
- cache
- fetch
- circuitry
- tag value
- requests
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
- G06F12/0879—Burst mode
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0891—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0895—Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This disclosure relates generally to computer caches and more particularly to multi-block fetch requests for cache misses.
- Caching is a well-known computing technique in which a subset of data is stored in (typically) faster cache circuitry and may be accessed multiple times without accessing a higher-level cache or memory.
- Memory read requests typically include an address and a portion of the address is used as a “tag” to determine whether data is already present in the cache. If not, a cache miss occurs and the requested data is fetched (and typically stored in the cache for potential subsequent requests).
- Caching may be implemented for various types of information, e.g., instructions and data. Similarly, caching may be utilized for translation data, e.g., in a translation lookaside buffer (TLB) that translates virtual addresses to physical addresses. More generally, caching may be utilized for translation information from one address space to another. In caches for translation, cache blocks may store page table entries.
- TLB translation lookaside buffer
- FIG. 1 A is a diagram illustrating an overview of example graphics processing operations, according to some embodiments.
- FIG. 1 B is a block diagram illustrating an example graphics unit, according to some embodiments.
- FIG. 2 is a block diagram illustrating example cache circuitry with a shared tag for multiple blocks, according to some embodiments.
- FIG. 3 is a block diagram illustrating an example miss fetch scoreboard for multi-block fetch requests, according to some embodiments.
- FIG. 4 is a block diagram illustrating an example miss fetch scoreboard for caching in the context of address translations, according to some embodiments.
- FIG. 5 is a flow diagram illustrating an example method, according to some embodiments.
- FIG. 6 is a block diagram illustrating an example computing device, according to some embodiments.
- FIG. 7 is a diagram illustrating example applications of disclosed systems and devices, according to some embodiments.
- FIG. 8 is a block diagram illustrating an example computer-readable medium that stores circuit design information, according to some embodiments.
- cache control circuitry is configured to share a tag across multiple cache blocks and aggregate multiple fetch requests that share the tag, e.g., until one of the requests wins arbitration. At that point, the control circuitry issues a multi-block fetch request.
- Disclosed techniques may advantageously reduce transactions on a fetch bus an efficiently use its bandwidth, particular in embodiments with a fetch bus that is wider than the cache block size. Disclosed techniques may also reduce power consumption by less frequently updating the tag. Further, disclosed techniques may reduce tag area due to shared tag addresses.
- FIGS. 1 A- 1 B provide an overview of graphics processing, which may implement disclosed caching techniques (although these techniques may also be implemented in other contexts such as central processing units, memory controllers, etc.).
- FIG. 2 provides an overview of cache control circuitry configured to issue multi-block fetch requests.
- FIGS. 3 and 4 provide example scoreboard circuitry configured to aggregate fetch requests.
- the remaining figures provide example methods, devices, systems, and computer-readable media.
- transform and lighting procedure 110 may involve processing lighting information for vertices received from an application based on defined light source locations, reflectance, etc., assembling the vertices into polygons (e.g., triangles), and transforming the polygons to the correct size and orientation based on position in a three-dimensional space.
- Clip procedure 115 may involve discarding polygons or vertices that fall outside of a viewable area.
- Rasterize procedure 120 may involve defining fragments within each polygon and assigning initial color values for each fragment, e.g., based on texture coordinates of the vertices of the polygon.
- Fragments may specify attributes for pixels which they overlap, but the actual pixel attributes may be determined based on combining multiple fragments (e.g., in a frame buffer), ignoring one or more fragments (e.g., if they are covered by other objects), or both.
- Shade procedure 130 may involve altering pixel components based on lighting, shadows, bump mapping, translucency, etc. Shaded pixels may be assembled in a frame buffer 135 .
- Modern GPUs typically include programmable shaders that allow customization of shading and other processing procedures by application developers. Thus, in various embodiments, the example elements of FIG. 1 A may be performed in various orders, performed in parallel, or omitted. Additional processing procedures may also be implemented.
- Vertex pipe 185 may include various fixed-function hardware configured to process vertex data. Vertex pipe 185 may be configured to communicate with programmable shader 160 in order to coordinate vertex processing. In the illustrated embodiment, vertex pipe 185 is configured to send processed data to fragment pipe 175 or programmable shader 160 for further processing.
- Fragment pipe 175 may include various fixed-function hardware configured to process pixel data. Fragment pipe 175 may be configured to communicate with programmable shader 160 in order to coordinate fragment processing. Fragment pipe 175 may be configured to perform rasterization on polygons from vertex pipe 185 or programmable shader 160 to generate fragment data. Vertex pipe 185 and fragment pipe 175 may be coupled to memory interface 180 (coupling not shown) in order to access graphics data.
- Programmable shader 160 in the illustrated embodiment, is configured to receive vertex data from vertex pipe 185 and fragment data from fragment pipe 175 and TPU 165 .
- Programmable shader 160 may be configured to perform vertex processing tasks on vertex data which may include various transformations and adjustments of vertex data.
- Programmable shader 160 in the illustrated embodiment, is also configured to perform fragment processing tasks on pixel data such as texturing and shading, for example.
- Programmable shader 160 may include multiple sets of multiple execution pipelines for processing data in parallel.
- programmable shader includes pipelines configured to execute one or more different SIMD groups in parallel.
- Each pipeline may include various stages configured to perform operations in a given clock cycle, such as fetch, decode, issue, execute, etc.
- the concept of a processor “pipeline” is well understood, and refers to the concept of splitting the “work” a processor performs on instructions into multiple stages.
- instruction decode, dispatch, execution (i.e., performance), and retirement may be examples of different pipeline stages.
- Many different pipeline architectures are possible with varying orderings of elements/portions.
- Various pipeline stages perform such steps on an instruction during one or more processor clock cycles, then pass the instruction or operations associated with the instruction on to other stages for further processing.
- SIMD group is intended to be interpreted according to its well-understood meaning, which includes a set of threads for which processing hardware processes the same instruction in parallel using different input data for the different threads.
- Various types of computer processors may include sets of pipelines configured to execute SIMD instructions.
- graphics processors often include programmable shader cores that are configured to execute instructions for a set of related threads in a SIMD fashion.
- names that may be used for a SIMD group include: a wavefront, a clique, or a warp.
- a SIMD group may be a part of a larger thread group, which may be broken up into a number of SIMD groups based on the parallel processing capabilities of a computer.
- each thread is assigned to a hardware pipeline that fetches operands for that thread and performs the specified operations in parallel with other pipelines for the set of threads.
- processors may have a large number of pipelines such that multiple separate SIMD groups may also execute in parallel.
- each thread has private operand storage, e.g., in a register file. Thus, a read of a particular register from the register file may provide the version of the register for each thread in a SIMD group.
- multiple programmable shader units 160 are included in a GPU.
- global control circuitry may assign work to the different sub-portions of the GPU which may in turn assign work to shader cores to be processed by shader pipelines.
- TPU 165 in the illustrated embodiment, is configured to schedule fragment processing tasks from programmable shader 160 .
- TPU 165 is configured to pre-fetch texture data and assign initial colors to fragments for further processing by programmable shader 160 (e.g., via memory interface 180 ).
- TPU 165 may be configured to provide fragment components in normalized integer formats or floating-point formats, for example.
- TPU 165 is configured to provide fragments in groups of four (a “fragment quad”) in a 2 ⁇ 2 format to be processed by a group of four execution pipelines in programmable shader 160 .
- cache circuitry 210 implements a read-only cache.
- table walk circuitry may update cache 210 to cache translation data during table walks, but incoming requests are reads only and do not include writes, in some embodiments.
- FIG. 3 is a block diagram illustrating example cache tag circuitry and a miss fetch scoreboard, according to some embodiments. Illustrated circuitry may be included in control circuitry 220 , cache circuitry 210 , or both.
- cache tag circuitry 310 may determine a cache hit and provide the cached block. For an incoming that does not match any tag, cache tag circuitry 310 may indicate a miss, initiate a fetch request, and may allocate a cache entry for the tag. For an incoming request that matches a tag but does not have a corresponding valid block, circuitry 310 may indicate a miss and initiate a fetch request. In various situations where cache tag circuitry 310 initiates a fetch, it may set the fetch pending field in circuitry 210 for the tag and set the corresponding block request valid field in miss fetch scoreboard 320 . Control circuitry may clear the fetch pending field for a tag when all fetches for blocks that share the tag have completed.
- miss fetch scoreboard 320 is configured to compare the request valid bits for the blocks with the blocks included in the fetch response. If all blocks were successfully returned for blocks with valid requests, miss fetch scoreboard 320 may clear the fetch pending field in circuitry 310 and clear the corresponding block request valid fields. If not all blocks were returned (e.g., due to incoming additional fetch requests to the tag after the multi-block fetch request was transmitted), miss fetch scoreboard 320 may update only blocks with returned data and may initiate a re-fetch for remaining block(s).
- miss fetch scoreboard 320 may allocate an entry and set a block request valid field in response to a fetch request for a tag that is not already present in the scoreboard. Similarly, for a fetch request whose tag is already in the scoreboard, miss fetch scoreboard 320 may set a corresponding block request valid field.
- disclosed techniques may advantageously reduce transactions on the fetch bus (which may more efficiently utilize fetch bus bandwidth) and may reduce power consumption by less frequently updating the tag fields, relative to traditional techniques. Further, disclosed techniques may reduce tag circuitry area, e.g., due to sharing tag address portions for comparison across multiple cache blocks.
- FIG. 4 is a block diagram illustrating example miss fetch scoreboard circuitry in the translation lookaside buffer (TLB) context, according to some embodiments.
- the translation request includes an address in a first address space (e.g., a virtual address) and the fetch response includes information such as a page table entry (PTE) associated with translating the address to a second address space (e.g., a physical space).
- PTE page table entry
- the embodiment of FIG. 4 may operate similarly to the embodiment of FIG. 3 .
- the cache blocks store PTEs and a PTE valid field is included for each cache block that shares a tag in cache tag circuitry 410 .
- miss fetch scoreboard circuitry 420 includes a PTE request valid field for each block that shares a tag.
- Cache tag circuitry 410 may be a TLB cache or may be an internal table walk cache (e.g., that stores page table entries, page directory entries, page catalog entries, etc.). In embodiments with multiple levels of caching, different address portions may be used for tag, set select, and block offset. In some embodiments, disclosed multi-block fetch techniques may be used at multiple different cache levels.
- multi-block fetch requests are not pre-fetches (e.g., fetches based on predictions that cache blocks will be needed in the future rather than based on actual requests). Therefore, in the embodiments of FIGS. 3 and 4 , for example, rather than blindly fetching a set of contiguous blocks of data, scoreboard circuitry maintains a field per potential block indicating whether that block is requested, while non-requested blocks are not fetched.
- Disclosed techniques may be more efficient than pre-fetching in certain implementations, particularly where access to a block may not imply a greater likelihood of accessing the next adjacent block in the near future, e.g., in page table walk embodiments. In some embodiments, disclosed techniques may be used in conjunction with pre-fetching at a given cache level.
- FIG. 5 is a flow diagram illustrating an example method for generating a multi-block fetch request, according to some embodiments.
- the method shown in FIG. 5 may be used in conjunction with any of the computer circuitry, systems, devices, elements, or components disclosed herein, among others.
- some of the method elements shown may be performed concurrently, in a different order than shown, or may be omitted. Additional method elements may also be performed as desired.
- cache circuitry (e.g., cache circuitry 210 ) stores multiple cache blocks.
- the cache blocks store respective page table entries and the cache circuitry is included in translation circuitry that is configured to convert an input address in a first address space to an output address in a second address space based on a page table entry (e.g., for a TLB that caches translation information for translations from virtual addresses to physical addresses).
- the cache circuity maintains a validity field per cache block and is configured to use the tag circuitry and validity fields to determine hits and misses at cache block granularity.
- cache control circuitry may determine hits and misses at a smaller granularity (e.g., block granularity) than the amount of data that may be fetched from a next-level cache or memory (e.g., multiple blocks).
- tag circuitry maintains a tag value shared by multiple cache blocks.
- the cache circuitry maintains a fetch pending field for a set of cache blocks that share the tag value.
- the fetch pending field may indicate whether a fetch is pending for any of the blocks that share the tag value.
- cache control circuitry (e.g., control circuitry 220 ) initiates, in response to a miss for a request for a first block of the multiple cache blocks, a fetch request to a next level cache or memory.
- the aggregation circuitry maintains, for the tag, a set of fields that indicate whether respective cache blocks that share the tag have a valid fetch request (e.g., the block request valid fields of FIG. 3 ).
- aggregation circuitry aggregates multiple fetch requests for cache blocks that share the tag value.
- arbitration circuitry is configured to arbitrate among requests to use a fetch bus and the aggregation circuitry is configured to aggregate fetch requests for cache blocks that share the tag value until the a request with the tag value wins arbitration for the fetch bus.
- the fetch bus has a width that is sufficient to fetch data in parallel for the multiple cache blocks that share the tag.
- the fetch bus may return data for the multi-block fetch requests over multiple beats (although note that the number of beats may be smaller than the number of blocks fetched, in some embodiments).
- fetch circuitry (e.g., fetch circuitry 224 ) initiates a single multi-block fetch operation to the next level cache or memory that returns cache blocks for the aggregated multiple fetch requests.
- the fetch circuitry is configured to receive a fetch response for the multi-block fetch operation and update the set of fields that indicate whether respective cache blocks that share the tag have a valid fetch request, based on cache blocks indicated as fetched in the fetch response. For example, the fetch circuitry may update miss fetch scoreboard 320 .
- device 600 includes fabric 610 , compute complex 620 input/output (I/O) bridge 650 , cache/memory controller 645 , graphics unit 675 , and display unit 665 .
- device 600 may include other components (not shown) in addition to or in place of the illustrated components, such as video processor encoders and decoders, image processing or recognition elements, computer vision elements, etc.
- Disclosed multi-block fetch techniques may be implemented in various elements of FIG. 6 , including one or more of cores 635 and 640 , graphics unit 675 , and controller 645 , for example.
- Fabric 610 may include various interconnects, buses, MUX's, controllers, etc., and may be configured to facilitate communication between various elements of device 600 . In some embodiments, portions of fabric 610 may be configured to implement various different communication protocols. In other embodiments, fabric 610 may implement a single communication protocol and elements coupled to fabric 610 may convert from the single communication protocol to other communication protocols internally.
- compute complex 620 includes bus interface unit (BIU) 625 , cache 630 , and cores 635 and 640 .
- compute complex 620 may include various numbers of processors, processor cores and caches.
- compute complex 620 may include 1, 2, or 4 processor cores, or any other suitable number.
- cache 630 is a set associative L2 cache.
- cores 635 and 640 may include internal instruction and data caches.
- a coherency unit (not shown) in fabric 610 , cache 630 , or elsewhere in device 600 may be configured to maintain coherency between various caches of device 600 .
- BIU 625 may be configured to manage communication between compute complex 620 and other elements of device 600 .
- Processor cores such as cores 635 and 640 may be configured to execute instructions of a particular instruction set architecture (ISA) which may include operating system instructions and user application instructions.
- ISA instruction set architecture
- Cache/memory controller 645 may be configured to manage transfer of data between fabric 610 and one or more caches and memories.
- cache/memory controller 645 may be coupled to an L3 cache, which may in turn be coupled to a system memory.
- cache/memory controller 645 may be directly coupled to a memory.
- cache/memory controller 645 may include one or more internal caches.
- the term “coupled to” may indicate one or more connections between elements, and a coupling may include intervening elements.
- graphics unit 675 may be described as “coupled to” a memory through fabric 610 and cache/memory controller 645 .
- graphics unit 675 is “directly coupled” to fabric 610 because there are no intervening elements.
- Graphics unit 675 may include one or more processors, e.g., one or more graphics processing units (GPU's). Graphics unit 675 may receive graphics-oriented instructions, such as OPENGL®, Metal, or DIRECT3D® instructions, for example. Graphics unit 675 may execute specialized GPU instructions or perform other operations based on the received graphics-oriented instructions. Graphics unit 675 may generally be configured to process large blocks of data in parallel and may build images in a frame buffer for output to a display, which may be included in the device or may be a separate device. Graphics unit 675 may include transform, lighting, triangle, and rendering engines in one or more graphics processing pipelines. Graphics unit 675 may output pixel information for display images. Graphics unit 675 , in various embodiments, may include programmable shader circuitry which may include highly parallel execution cores configured to execute graphics programs, which may include pixel tasks, vertex tasks, and compute tasks (which may or may not be graphics-related).
- graphics processing units GPU's
- Graphics unit 675 may receive graphics-
- Display unit 665 may be configured to read data from a frame buffer and provide a stream of pixel values for display.
- Display unit 665 may be configured as a display pipeline in some embodiments. Additionally, display unit 665 may be configured to blend multiple frames to produce an output frame. Further, display unit 665 may include one or more interfaces (e.g., MIPI® or embedded display port (eDP)) for coupling to a user display (e.g., a touchscreen or an external display).
- interfaces e.g., MIPI® or embedded display port (eDP)
- I/O bridge 650 may include various elements configured to implement: universal serial bus (USB) communications, security, audio, and low-power always-on functionality, for example. I/O bridge 650 may also include interfaces such as pulse-width modulation (PWM), general-purpose input/output (GPIO), serial peripheral interface (SPI), and inter-integrated circuit (I2C), for example. Various types of peripherals and devices may be coupled to device 600 via I/O bridge 650 .
- PWM pulse-width modulation
- GPIO general-purpose input/output
- SPI serial peripheral interface
- I2C inter-integrated circuit
- device 600 includes network interface circuitry (not explicitly shown), which may be connected to fabric 610 or I/O bridge 650 .
- the network interface circuitry may be configured to communicate via various networks, which may be wired, wireless, or both.
- the network interface circuitry may be configured to communicate via a wired local area network, a wireless local area network (e.g., via WiFi), or a wide area network (e.g., the Internet or a virtual private network).
- the network interface circuitry is configured to communicate via one or more cellular networks that use one or more radio access technologies.
- the network interface circuitry is configured to communicate using device-to-device communications (e.g., Bluetooth or WiFi Direct), etc.
- the network interface circuitry may provide device 600 with connectivity to various types of other devices and networks.
- a wearable device 760 such as a smartwatch or a health-monitoring device.
- Smartwatches may implement a variety of different functions—for example, access to email, cellular service, calendar, health monitoring, etc.
- a wearable device may also be designed solely to perform health-monitoring functions, such as monitoring a user's vital signs, performing epidemiological functions such as contact tracing, providing communication to an emergency medical service, etc.
- Other types of devices are also contemplated, including devices worn on the neck, devices implantable in the human body, glasses or a helmet designed to provide computer-generated reality experiences such as those based on augmented and/or virtual reality, etc.
- the applications illustrated in FIG. 7 are merely exemplary and are not intended to limit the potential future applications of disclosed systems or devices.
- Other example applications include, without limitation: portable gaming devices, music players, data storage devices, unmanned aerial vehicles, etc.
- the present disclosure has described various example circuits in detail above. It is intended that the present disclosure cover not only embodiments that include such circuitry, but also a computer-readable storage medium that includes design information that specifies such circuitry. Accordingly, the present disclosure is intended to support claims that cover not only an apparatus that includes the disclosed circuitry, but also a storage medium that specifies the circuitry in a format that is recognized by a fabrication system configured to produce hardware (e.g., an integrated circuit) that includes the disclosed circuitry. Claims to such a storage medium are intended to cover, for example, an entity that produces a circuit design, but does not itself fabricate the design.
- FIG. 8 is a block diagram illustrating an example non-transitory computer-readable storage medium that stores circuit design information, according to some embodiments.
- semiconductor fabrication system 820 is configured to process the design information 815 stored on non-transitory computer-readable medium 810 and fabricate integrated circuit 830 based on the design information 815 .
- Non-transitory computer-readable storage medium 810 may comprise any of various appropriate types of memory devices or storage devices.
- Non-transitory computer-readable storage medium 810 may be an installation medium, e.g., a CD-ROM, floppy disks, or tape device; a computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; a non-volatile memory such as a Flash, magnetic media, e.g., a hard drive, or optical storage; registers, or other similar types of memory elements, etc.
- Non-transitory computer-readable storage medium 810 may include other types of non-transitory memory as well or combinations thereof.
- Non-transitory computer-readable storage medium 810 may include two or more memory mediums which may reside in different locations, e.g., in different computer systems that are connected over a network.
- embodiments are non-limiting. That is, the disclosed embodiments are not intended to limit the scope of claims that are drafted based on this disclosure, even where only a single example is described with respect to a particular feature.
- the disclosed embodiments are intended to be illustrative rather than restrictive, absent any statements in the disclosure to the contrary. The application is thus intended to permit claims covering disclosed embodiments, as well as such alternatives, modifications, and equivalents that would be apparent to a person skilled in the art having the benefit of this disclosure.
- a recitation of “w, x, y, or z, or any combination thereof” or “at least one of . . . w, x, y, and z” is intended to cover all possibilities involving a single element up to the total number of elements in the set. For example, given the set [w, x, y, z], these phrasings cover any single element of the set (e.g., w but not x, y, or z), any two elements (e.g., w and x, but not y or z), any three elements (e.g., w, x, and y, but not z), and all four elements.
- w, x, y, and z thus refers to at least one element of the set [w, x, y, z], thereby covering all possible combinations in this list of elements. This phrase is not to be interpreted to require that there is at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.
- labels may precede nouns or noun phrases in this disclosure.
- different labels used for a feature e.g., “first circuit,” “second circuit,” “particular circuit,” “given circuit,” etc.
- labels “first,” “second,” and “third” when applied to a feature do not imply any type of ordering (e.g., spatial, temporal, logical, etc.), unless stated otherwise.
- a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors.
- an entity described or recited as being “configured to” perform some task refers to something physical, such as a device, circuit, a system having a processor unit and a memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible.
- various units/circuits/components may be described herein as performing a set of task or operations. It is understood that those entities are “configured to” perform those tasks/operations, even if not specifically noted.
- circuits may be described in this disclosure. These circuits or “circuitry” constitute hardware that includes various types of circuit elements, such as combinatorial logic, clocked storage devices (e.g., flip-flops, registers, latches, etc.), finite state machines, memory (e.g., random-access memory, embedded dynamic random-access memory), programmable logic arrays, and so on. Circuitry may be custom designed, or taken from standard libraries. In various implementations, circuitry can, as appropriate, include digital components, analog components, or a combination of both. Certain types of circuits may be commonly referred to as “units” (e.g., a decode unit, an arithmetic logic unit (ALU), functional unit, memory management unit (MMU), etc.). Such units also refer to circuits or circuitry.
- ALU arithmetic logic unit
- MMU memory management unit
- circuits/units/components and other elements illustrated in the drawings and described herein thus include hardware elements such as those described in the preceding paragraph.
- the internal arrangement of hardware elements within a particular circuit may be specified by describing the function of that circuit.
- a particular “decode unit” may be described as performing the function of “processing an opcode of an instruction and routing that instruction to one or more of a plurality of functional units,” which means that the decode unit is “configured to” perform this function.
- This specification of function is sufficient, to those skilled in the computer arts, to connote a set of possible structures for the circuit.
- circuits, units, and other elements may be defined by the functions or operations that they are configured to implement.
- the arrangement and such circuits/units/components with respect to each other and the manner in which they interact form a microarchitectural definition of the hardware that is ultimately manufactured in an integrated circuit or programmed into an FPGA to form a physical implementation of the microarchitectural definition.
- the microarchitectural definition is recognized by those of skill in the art as structure from which many physical implementations may be derived, all of which fall into the broader structure described by the microarchitectural definition.
- HDL hardware description language
- Such an HDL description may take the form of behavioral code (which is typically not synthesizable), register transfer language (RTL) code (which, in contrast to behavioral code, is typically synthesizable), or structural code (e.g., a netlist specifying logic gates and their connectivity).
- the HDL description may subsequently be synthesized against a library of cells designed for a given integrated circuit fabrication technology, and may be modified for timing, power, and other reasons to result in a final design database that is transmitted to a foundry to generate masks and ultimately produce the integrated circuit.
- Some hardware circuits or portions thereof may also be custom-designed in a schematic editor and captured into the integrated circuit design along with synthesized circuitry.
- the integrated circuits may include transistors and other circuit elements (e.g.
- the HDL design may be synthesized to a programmable logic array such as a field programmable gate array (FPGA) and may be implemented in the FPGA.
- FPGA field programmable gate array
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/324,800 US12248399B2 (en) | 2021-05-19 | 2021-05-19 | Multi-block cache fetch techniques |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/324,800 US12248399B2 (en) | 2021-05-19 | 2021-05-19 | Multi-block cache fetch techniques |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220374359A1 US20220374359A1 (en) | 2022-11-24 |
| US12248399B2 true US12248399B2 (en) | 2025-03-11 |
Family
ID=84103729
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/324,800 Active 2041-05-27 US12248399B2 (en) | 2021-05-19 | 2021-05-19 | Multi-block cache fetch techniques |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12248399B2 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12204562B2 (en) * | 2023-01-11 | 2025-01-21 | Arm Limited | Data storage structure |
Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6256727B1 (en) | 1998-05-12 | 2001-07-03 | International Business Machines Corporation | Method and system for fetching noncontiguous instructions in a single clock cycle |
| US20030088758A1 (en) * | 2001-11-08 | 2003-05-08 | Matthew Becker | Methods and systems for determining valid microprocessor instructions |
| US20040148471A1 (en) * | 2003-01-28 | 2004-07-29 | Sun Microsystems, Inc | Multiprocessing computer system employing capacity prefetching |
| US20070067567A1 (en) * | 2005-09-19 | 2007-03-22 | Via Technologies, Inc. | Merging entries in processor caches |
| US20090198858A1 (en) * | 2008-01-31 | 2009-08-06 | Sony Corporation | Semiconductor memory device and operation method therefor |
| US20090249036A1 (en) * | 2008-03-31 | 2009-10-01 | Lihu Rappoport | Efficient method and apparatus for employing a micro-op cache in a processor |
| US7783840B2 (en) | 2004-08-05 | 2010-08-24 | Fujitsu Limited | Method and apparatus for controlling memory system |
| US20150121009A1 (en) * | 2013-10-30 | 2015-04-30 | Advanced Micro Devices, Inc. | Method and apparatus for reformatting page table entries for cache storage |
| US9892803B2 (en) | 2014-09-18 | 2018-02-13 | Via Alliance Semiconductor Co., Ltd | Cache management request fusing |
| US20180307608A1 (en) * | 2017-04-25 | 2018-10-25 | Shanghai Zhaoxin Semiconductor Co., Ltd. | Processor cache with independent pipeline to expedite prefetch request |
| US20190042422A1 (en) * | 2018-03-21 | 2019-02-07 | Intel Corporation | Cache architecture using way id to reduce near memory traffic in a two-level memory system |
| US20190079874A1 (en) * | 2017-09-13 | 2019-03-14 | Arm Limited | Cache line statuses |
| US20190213707A1 (en) * | 2018-01-10 | 2019-07-11 | Intel Corporation | Scalable memory interface for graphical processor unit |
| US10579531B2 (en) | 2017-08-30 | 2020-03-03 | Oracle International Corporation | Multi-line data prefetching using dynamic prefetch depth |
| US10621100B1 (en) * | 2016-04-07 | 2020-04-14 | Apple Inc. | Unified prefetch circuit for multi-level caches |
-
2021
- 2021-05-19 US US17/324,800 patent/US12248399B2/en active Active
Patent Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6256727B1 (en) | 1998-05-12 | 2001-07-03 | International Business Machines Corporation | Method and system for fetching noncontiguous instructions in a single clock cycle |
| US20030088758A1 (en) * | 2001-11-08 | 2003-05-08 | Matthew Becker | Methods and systems for determining valid microprocessor instructions |
| US20040148471A1 (en) * | 2003-01-28 | 2004-07-29 | Sun Microsystems, Inc | Multiprocessing computer system employing capacity prefetching |
| US7783840B2 (en) | 2004-08-05 | 2010-08-24 | Fujitsu Limited | Method and apparatus for controlling memory system |
| US20070067567A1 (en) * | 2005-09-19 | 2007-03-22 | Via Technologies, Inc. | Merging entries in processor caches |
| US20090198858A1 (en) * | 2008-01-31 | 2009-08-06 | Sony Corporation | Semiconductor memory device and operation method therefor |
| US20090249036A1 (en) * | 2008-03-31 | 2009-10-01 | Lihu Rappoport | Efficient method and apparatus for employing a micro-op cache in a processor |
| US20150121009A1 (en) * | 2013-10-30 | 2015-04-30 | Advanced Micro Devices, Inc. | Method and apparatus for reformatting page table entries for cache storage |
| US9892803B2 (en) | 2014-09-18 | 2018-02-13 | Via Alliance Semiconductor Co., Ltd | Cache management request fusing |
| US10621100B1 (en) * | 2016-04-07 | 2020-04-14 | Apple Inc. | Unified prefetch circuit for multi-level caches |
| US20180307608A1 (en) * | 2017-04-25 | 2018-10-25 | Shanghai Zhaoxin Semiconductor Co., Ltd. | Processor cache with independent pipeline to expedite prefetch request |
| US10579531B2 (en) | 2017-08-30 | 2020-03-03 | Oracle International Corporation | Multi-line data prefetching using dynamic prefetch depth |
| US20190079874A1 (en) * | 2017-09-13 | 2019-03-14 | Arm Limited | Cache line statuses |
| US20190213707A1 (en) * | 2018-01-10 | 2019-07-11 | Intel Corporation | Scalable memory interface for graphical processor unit |
| US20190042422A1 (en) * | 2018-03-21 | 2019-02-07 | Intel Corporation | Cache architecture using way id to reduce near memory traffic in a two-level memory system |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220374359A1 (en) | 2022-11-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11467959B1 (en) | Cache arbitration for address translation requests | |
| US20250094357A1 (en) | Cache Control to Preserve Register Data | |
| US20250086116A1 (en) | Atomic Smashing | |
| US12248399B2 (en) | Multi-block cache fetch techniques | |
| US12487927B2 (en) | Remote cache invalidation | |
| US12169898B1 (en) | Resource allocation for mesh shader outputs | |
| US12412233B1 (en) | Color state techniques for graphics processors | |
| US20220083377A1 (en) | Compute Kernel Parsing with Limits in one or more Dimensions | |
| US12026098B1 (en) | Hardware-assisted page pool grow operation | |
| US12141892B1 (en) | Distributed geometry processing and tracking closed pages | |
| US11720501B2 (en) | Cache replacement based on traversal tracking | |
| US11947462B1 (en) | Cache footprint management | |
| US12405891B2 (en) | Graphics processor cache for data from multiple memory spaces | |
| US20250104181A1 (en) | Coherency Control for Compressed Graphics Data | |
| US11500692B2 (en) | Dynamic buffering control for compute work distribution | |
| US20250342645A1 (en) | Mapping Texture Point Samples to Lanes of a Filter Pipeline | |
| US12468644B2 (en) | Invalidation of permission information stored by another processor | |
| US12498932B1 (en) | Physical register sharing | |
| US12360899B2 (en) | Scoreboard for register data cache | |
| US12340485B1 (en) | Graphics geometry processing with segmented and non-segmented sets of work | |
| US12165251B1 (en) | Mesh shader work distribution | |
| US12026108B1 (en) | Latency-based performance state control | |
| US12288286B1 (en) | Parse techniques for graphics workload distribution | |
| US11210761B1 (en) | Circuitry to determine set of priority candidates |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YEUNG, WINNIE W.;LI, CHENG;SIGNING DATES FROM 20210511 TO 20210517;REEL/FRAME:056291/0067 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |