CN113039531B

CN113039531B - Method, system and storage medium for allocating cache resources

Info

Publication number: CN113039531B
Application number: CN202080005647.7A
Authority: CN
Inventors: 维诺德·沙马蒂; 若昂·迪亚斯
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2019-02-13
Filing date: 2020-01-28
Publication date: 2023-12-01
Anticipated expiration: 2040-01-28
Also published as: US11853223B2; US20200257631A1; TW202234248A; US20220156198A1; CN117707998A; TW202036299A; WO2020167459A1; CN113039531A; TWI761762B; US11188472B2; EP3850492A1; TWI787129B

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for allocating cache resources based on page-level attribute values. In one embodiment, the system includes one or more integrated client devices and a cache. Each client device is configured to generate at least a memory request. Each memory request has a respective physical address and a respective page descriptor for the page to which the physical address belongs. The cache is configured to cache memory requests of each of the one or more integrated client devices. The cache includes a cache memory having a plurality of links. The cache is configured to distinguish between different memory requests using page-level attributes of respective page descriptors of the memory requests and to allocate different portions of the cache memory to different respective memory requests.

Description

Method, system and storage medium for allocating cache resources

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application serial No. 62/805,167 entitled "Caching Streams of Memory Requests," filed on 13, 2 nd 2019, the entire contents of which are incorporated herein by reference.

Background

The present description relates to systems with integrated circuit devices.

A cache is a device that stores data retrieved from memory or data to be written to memory for one or more different hardware devices in the system. The hardware devices may be different components integrated into a system on a chip (SOC). In this specification, a device that provides a read request and a write request through a cache will be referred to as a client device.

The cache may be used to reduce power consumption by reducing overall requests to main memory. In addition, power may be further saved by placing the main memory and the data path to the main memory in a low power state as long as the client device can access the data needed in the cache. Thus, cache usage is related to overall power consumption, and an increase in cache usage results in a decrease in overall power consumption. Thus, a device that relies on battery power (e.g., a mobile computing device) may extend its battery life by increasing the cache usage of the integrated client device.

The cache is typically organized as multiple sets with multiple roads. The requested memory address is used to identify the particular set in which the cache line is placed, and if an existing cache line must be replaced, the particular cache policy determines which roads within the set should be replaced. For example, the cache may implement a cache policy that first replaces the least recently used cache line within the set.

Caches are typically low-level hardware devices that have no visibility into the interpretation of the address being cached. In other words, conventional caches have no mechanism for determining what the address is for or what type of data is stored at the address. Since unrelated sets of data requests may compete for the same cache resources, this may result in inefficiency in cache performance, resulting in fewer cache hits and more cache misses and resulting more memory strokes.

Disclosure of Invention

This specification describes techniques for implementing a caching policy in a cache driven by an associated data stream (also referred to herein as a "data stream"). In this specification, a stream is a plurality of memory requests related to each other in software. For example, a stream may include all instruction requests to the same software driver. A stream may also include all data requests to the same software driver.

The cache may identify the data stream by examining page-level attributes common to different memory requests. The cache may then allocate different portions of the cache memory to different data streams. Thus, instruction requests may be allocated to different portions of the cache, for example, as compared to data requests. This capability allows the cache to allocate cache portions based on attributes of the data, rather than just address ranges, page identifiers, or identification of the requesting client device or client drive.

Particular embodiments of the subject matter described in this specification can be implemented to realize one or more of the following advantages. The cache may improve the performance and use of the cache by determining the associated memory request stream using page-level attributes. Thus, the cache may reduce contention for cache resources for different memory request streams, thereby increasing the cache hit rate. In mobile devices that rely on battery power, increasing cache hit rates reduces power consumption and extends battery life. In addition, by using page level attributes, the cache can allocate the same portion of the cache to client devices having producer/consumer relationships, effectively increasing the amount of cache resources available to the overall system.

The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

Drawings

Fig. 1 is a block diagram of an example system according to an embodiment of the present disclosure.

FIG. 2 is a flowchart of an example process for assigning page-level attribute values to assigned memory pages according to an embodiment of the present disclosure.

Fig. 3 is a flowchart of an example process for allocating cached roads according to an embodiment of the present disclosure.

FIG. 4 is a flowchart of an example process for servicing a memory request using a cache portion dedicated to the memory request stream, according to an embodiment of the present disclosure.

Like reference numbers and designations in the various drawings indicate like elements.

Detailed Description

Fig. 1 is a diagram of an example system 100. The system 100 includes a system on a chip (SOC) 102 communicatively coupled to a memory device 140. The SOC 102 has a plurality of client devices 110a, 110b to 110n using a cache 120, the cache 120 being arranged in a data path to a memory 140. In this example, since cache 120 caches data requests for multiple client devices in a single SOC 102, cache 120 may be referred to as a system level cache. However, the same techniques described below may also be used for other caches that cache memory requests of only a single client device or software driver.

SOC 102 is an example of a device that may be installed on or integrated into any suitable computing device, which may be referred to as a host device. Because the techniques described in this specification are particularly suited to conserving power consumption of a host device, SOC 102 may be particularly beneficial when SOC 102 is installed on a mobile host device that depends on battery power (e.g., a smart phone, a smart watch, or another wearable computing device, a tablet computer, or a laptop computer, to name a few examples).

Cache 120 is an example of a cache that may implement a cache policy that allocates cache resources by identifying different memory request flows through cache 120. For example, the first and second cache portions of the cache 120 may be allocated to two different streams, respectively. The example SOC 102 is illustrated with one system level cache 120. However, the SOC may have multiple caches, each of which may or may not be a system level cache.

SOC 102 has a plurality of client devices 110a through 110n. Each of the client devices 110 a-110 n may be any suitable module, device, or functional component configured to read data through the SOC structure 150 and store it in the memory device 140. For example, the client device may be a CPU, an application specific integrated circuit, or a low-level component of the SOC 102 itself, each capable of initiating communication through the SOC fabric 150.

Each client device 110 a-110 n includes a respective Address Translation Unit (ATU), such as ATU 112 a-112 n. The ATUs 112a to 112n are responsible for translating virtual addresses provided by processes executing on client devices into physical addresses in the memory device 140 that have or are to store corresponding data. In some implementations, one or more of the ATUs 112 a-112 n are implemented as memory management units configured to perform address translation using hardware implementation traversals of page tables storing virtual-to-physical address translations.

Regardless of the implementation, when the ATUs 112 a-112 n receive a memory request, the ATUs perform address translation and generate a memory request having the resulting physical address and a page descriptor for the page in memory to which the physical address belongs. As part of this process, each ATU 112 a-112 n may be configured to populate a page descriptor with page-level attributes of the page in memory to which the physical address belongs.

The page descriptor may have values for a plurality of attributes associated with various aspects of the virtual memory system. For example, the page descriptor may specify table level attributes associated with the page table itself and address level attributes associated with the respective physical or virtual address. Thus, the page-level attribute value may be a subset of the page descriptors that the ATU generates and includes within the memory request. The page-level attribute may be a page-based hardware attribute of the page descriptor. Each page level attribute value specifies a value for an attribute specific to the page to which the physical address belongs.

The values of the page-level attributes may be assigned by the software drivers of the client devices 110a through 110 n. For example, when a software driver executing for a client device requests allocation of pages in memory 140, the software driver may instruct the ATU to assign particular page-level attributes to those pages. Each client device may perform one or more processes, such as processes 114a through 114b, and each process may assign its own page-level attributes as part of the memory allocation request. These processes then generate virtual addresses to be translated by the corresponding ATUs, and the corresponding ATUs translate the virtual addresses to physical addresses and display the previously assigned page-level attributes when the memory request is generated.

For example, FIG. 1 illustrates a memory 140 having a plurality of allocated pages 132a, 132b, 132c, 132d, and 132e. Each of the pages has been assigned a respective page level attribute value by the particular software driver that allocated the page in memory 140. For example, the memory 140 has an instruction page 132a for the first process 114a and an instruction page 132b for the second process 114b executing on the first client device 110 a. Memory 140 also has two pages 132c and 132e of data for second client 110 b. And memory 140 has a page table page 132d for first client 110 a.

By way of this example, cache 120 may make many distinctions between these pages simply by examining the assigned page-level attributes. For example, the cache 120 may distinguish between instructions of the first process 114a and the second process 114 b. Thus, the cache 120 may allocate different cache resources of the memory request to the pages such that the two processes do not compete for cache resources with each other.

In addition, the cache 120 may determine that two data pages 132c and 132e belong to the same data stream by having the same page level attribute value of 7. Thus, cache 120 may allocate the same cache resources of a memory request to any of these pages.

Finally, cache 120 may distinguish between a page having instructions for the process on first client 110a and a page having page table data for ATU 112a by examining page level attributes of pages 132a, 132b, and 132 d.

SOC structure 150 is a communication subsystem of SOC 102. SOC structure 150 includes a communication pathway that allows client devices 110 a-110 n to communicate with each other and to make requests to read and write data using memory device 140. SOC structure 150 may include any suitable combination of communication hardware, such as a bus or dedicated interconnect circuitry.

The system 100 also includes a communication pathway that allows communication between the cache 120 and the memory controller 130 and an inter-chip communication pathway that allows communication between the memory controller 130 and the memory device 140. In some implementations, SOC 102 may conserve power by powering down one or more communication pathways. Alternatively or additionally, in some implementations, SOC 102 may power down memory device 140 to further conserve power. As another example, SOC 102 may enter a clock cut-off mode in which respective clock circuits are powered down for one or more devices.

Cache 120 is located in the data path between SOC structure 150 and memory controller 130. Memory controller 130 may handle requests to and from memory device 140. Thus, requests from client devices 110a through 110n to read from or write to memory device 140 pass through cache 120. For example, client 110a may make a request to read from memory device 140, which request passes through SOC structure 150 to cache 120. The cache 120 may handle the request before forwarding the request to the memory controller 130 of the memory device 140.

Cache 120 may cache read requests, write requests, or both from client devices 110a through 110 n. The cache 120 may cache read requests from client devices by responding to requests with data stored in the cached data rather than retrieving the data from the memory device 140. Similarly, the cache 120 may cache write requests from client devices by writing new data to the cache instead of writing new data to the memory device 140. The cache 120 may perform a write back at a later time to write the updated data to the memory device 140.

The cache 120 may have a dedicated cache memory, which may be implemented using a dedicated register or a high-speed random access memory. Cache 120 may implement a cache policy that allocates different portions of cache memory (e.g., roads) to different respective memory request streams. Thus, memory requests belonging to the same stream may be handled using the same allocated portion of cache memory. To do so, cache 120 may examine specific page-level attributes of page descriptors included in the memory request to determine which pages belong to the same memory request stream. This allows the cache 120 to determine that physical addresses belonging to different pages belong to the same stream of memory requests.

One example of these techniques includes assigning different portions of a cache to different processes executing on the same client device. For example, the cache 120 may examine page descriptors of incoming memory requests to determine that some requests are related to pages owned by the first process 114a and some other requests are related to pages owned by the second process 114b. Thus, to prevent the two processes from competing for cache resources with each other, the cache 120 may allocate a first portion of the cache to a first process 114a executing on the client device 110a and may allocate a second portion of the cache to a second process 114b executing on the same client device.

Another example includes assigning different portions of the cache to different buffers. For example, when the SOC is a Graphics Processing Unit (GPU), each client device may perform different functions in the graphics processing pipeline. Thus, different data streams may be identified for the render buffer, texture buffer, and vertex buffer, to name a few examples.

The cache 120 may also be configured to implement even more sophisticated caching behavior. For example, if the first process 114a is a producer process whose data is consumed by the second process 114b, the cache 120 may examine the page descriptor of the incoming memory request to determine that both the producer process and the consumer process are active. In that case, the cache 120 may allocate a single portion of the cache to both processes and invalidate the cache line every time the cache line is read. In some implementations, the cache applies the replacement policy only to consumer processes that do not perform speculative reads. This configuration may cause all write requests from the producer process to result in a cache hit, and all read requests from the consumer process also result in a cache hit, since invalidating them when reading all cache lines means that the cache size occupied by the process does not continue to increase. This in turn allows the cache 120 to allocate smaller portions of the cache memory to producer and consumer processes, which further improves cache performance by freeing up cache resources for use by other devices and processes. In addition, this cache behavior for producer-consumer processes saves power because invalid cache lines will never need to be written back to memory when consumer processes read them.

Cache 120 may use the controller pipeline to handle memory requests from SOC structure 150. The controller pipeline implements cache logic to determine whether data is present in the cache or whether data needs to be fetched from or written to memory. Thus, when memory access is required, such as upon a cache miss, the controller pipeline may also provide a transaction to the memory controller 130.

The cache 120 may be configured by writing a plurality of different page level attribute values. The attribute values may be written to reserved configuration registers of the cache 120, such as special function registers. The cache 120 may interpret the corresponding attribute values present in the configuration registers as indicating that dedicated cache resources should be allocated for a particular data stream, and each attribute value in the configuration registers corresponds to a separate data stream for which a dedicated portion of the cache should be allocated.

In some implementations, the attribute values may specify wild card bits so that the cache 120 may match over the attribute pattern rather than just over the fully specified attribute values. The attribute pattern may include wild card bits that match any value. For example, the attribute mode may be XX10, where X may be 1 or 0. Thus, the following bit sequences will all match the example attribute pattern: 0010. 1110 and 1010.

The allocation engine of the cache 120 may be configured to allocate portions of the cache using attribute values written to the configuration space of the cache 120. For example, the allocation engine may allocate a first portion of the cache for memory requests having page-level attributes with a first attribute value and a second portion of the cache for memory requests having page-level attributes that match a second attribute value. The attribute values may be preloaded onto cache 120 at manufacture time or may be dynamically created when servicing memory requests.

The allocation engine may generate a final cache configuration that the controller pipeline uses to service memory requests from SOC structure 150. In particular, the final cache configuration may specify which ways of the cache memory are allocated to which flows so that the controller pipeline may determine which ways to use when servicing incoming memory requests. Alternatively or additionally, cache 120 may maintain a quota for each portion and use hardware to count how much quota has been used.

The allocation engine may use dedicated hardware circuitry of the cache 120 to perform the allocation techniques described below. Alternatively or additionally, the allocation process may be implemented in software and the allocation engine may cause the CPU of the host device to execute the allocation algorithm.

FIG. 2 is a flow chart of an example process for assigning page level attribute values to allocated memory pages. This example process may be performed by any suitable client device executing a particular software module capable of modifying page table entries. For example, the client device may execute a modified operating system that modifies page table entries according to different types of data streams. Alternatively or additionally, the client device may execute client application software that modifies page table entries according to different types of data streams. For convenience, the example process is described as being performed by a client device, suitably programmed in accordance with the present description.

The client device allocates one or more pages of memory area for the software driver (210). The software driver is a software module executing on the client device. The client device may allocate the memory region before or after the start-up of the software driver or at some other point during execution of the software driver that performs a new request for allocating memory.

The client device determines page level attributes of one or more pages of the memory region (220). The modified OS of the client device may automatically differentiate between different kinds of pages at the time of memory allocation. For example, the OS may differentiate between instruction pages and data pages.

Alternatively or in addition, the OS of the client device may support an Application Programming Interface (API) that allows software drivers to assign specific page level attributes at the time of memory allocation. For example, the OS does not need any knowledge of the interpretation behind the values of the page-level attributes. Instead, the software driver itself may be programmed to assign page level attributes for a particular use case. For example, the software driver may assign one page level attribute to an instruction page, a different page level attribute to a data page, and another page level attribute to a page table data page.

The client device modifies the page table to associate page level attributes with one or more allocated pages of the requested memory region (230). In other words, the client device stores an association between each allocated page and each assigned page-level attribute value in the page table.

FIG. 3 is a flow chart of an example process for allocating cached roads. The example process may be performed by one or more components of the cache. The example process is described as being performed by an allocation engine of a cache on a SOC, suitably programmed in accordance with the present description.

The allocation engine identifies a stream of memory requests for allocation (310). The allocation engine may identify the flow in a number of ways. As described above, the cache may have configuration registers configured before or after manufacture that specify attribute values or attribute patterns, and the allocation engine may interpret each attribute value or pattern as a separate stream requiring dedicated cache memory.

Alternatively or additionally, the cache may identify the memory request stream by monitoring memory traffic. For example, the cache may maintain statistics of the most common values of page-level attributes in all memory requests, and may allocate private portions of the cache for the most common page-level attribute values.

Many different events may trigger the cache start allocation process by identifying a memory request stream. For example, the cache may begin to be allocated at start-up time. As another example, the SOC may be configured to automatically generate a repartition trigger event when the SOC detects an execution or usage change. The trigger event may be a signal or data received by the system indicating that the configuration registers have been modified and that various portions of the cache need to be reassigned.

The allocation engine identifies page-level attribute values associated with the data stream (320). As described above, each memory request may include a page descriptor generated from the address translation process, which may include a page-level attribute value.

The memory request stream may be associated with more than one page-level attribute value. For example, the configuration register may also specify an association with a flow identifier. Thus, the cache may use the specified association to identify the page-level attribute values of the stream, rather than just the occurrence of the page-level attribute values in the configuration registers. In the case of multiple page level attribute values, the cache may repeat (at least in part) the example process for each of the identified page level attribute values. In addition, as described above, the page level attribute value may specify wild card bits so that the cache effectively uses the attribute mode.

The allocation engine allocates a portion of the cache to a request with a page level attribute value (330). The attribute engine may allocate any suitable portion of the cache, such as one or more lines, sets, roads, or some combination of these. In some implementations, portions are exclusively allocated such that only memory requests having specified page-level attribute values may use allocated cache resources.

As described above, the allocation process may distinguish between different types of pages based on page-level attribute values. For example, the allocation engine may distinguish an instruction stream from a data stream, and may allocate one portion of the cache to the instruction stream and another portion of the cache to the data stream.

Additionally, the allocation engine may distinguish a first instruction stream executed by a client device from a second instruction stream executed by the same client device or a different second client device, and may allocate different portions of the cache to different instruction streams.

In addition, the allocation engine may distinguish between different types of data pages. For example, the allocation engine may allocate different portions of the cache to different pages of data or pages belonging to different memory buffers. In some implementations, the allocation engine may assign special priorities to pages storing a particular type of data structure, and may allocate different amounts of cache resources for each page. For example, a page table is a data buffer that has a significant impact on cache utilization. Thus, the allocation engine may treat the data buffers storing page table data differently than buffers storing other kinds of data. For example, the allocation engine may allocate 1MB of cache memory for page table pages and 4kb of cache memory to other types of data buffers.

The cache then services the memory request from the client device on the SOC based on the page-level attribute value in the page descriptor of the memory request (340). In so doing, the cache may effectively dedicate portions of the cache to different memory request streams.

FIG. 4 is a flow chart of an example process for servicing a memory request using a cache portion dedicated to the memory request stream. The example process may be performed by one or more components of the cache. The example process is described as being performed by a cache on an SOC (e.g., cache 120 of fig. 1).

The cache receives a memory request (410). The memory request may be generated by an address translation unit of a particular client device and may include a physical address and a page descriptor generated during an address translation process.

The cache determines whether a cache line has been allocated for the request (420). In other words, if a cache line has been allocated for a request, the cache may bypass the checking of page level attributes entirely.

If a cache line has been allocated for the request, the cache may service the request using the previously allocated cache line (branch to 430). It is noted that in such a scenario, even though a particular stream of memory requests has a dedicated cache portion, if the request has been previously cached in a different portion, the cache may use the different portion to service the memory request. In some embodiments, to increase the hit rate of the cache, the system may move the cache line from the portion previously allocated to the portion dedicated to the memory request stream.

If a cache line is not allocated for the request (420), the cache identifies a page-level attribute value for a page descriptor associated with the memory request (440). The page descriptor may include one or more attributes of the translation process, such as an address of a page table associated with the memory request. The cache may identify page-level attributes of page descriptors associated with pages to which the physical address belongs.

The cache determines whether the page level attribute value has a private cache portion (450). For example, the cache may compare the page level attribute value to one or more attribute values stored in a configuration register. In response to determining that the page level attribute value has a private cache portion, the cache services the memory request by using the private cache portion (460). Otherwise, the cache services the memory request using a default cache policy (470).

Thus, the caches may identify different memory request streams and allocate different portions of cache memory to respective streams based on their respective page-level attribute values. Additionally, the cache may assign different replacement policies to different memory request streams using page-level attribute values.

For example, the cache may allocate some portions of the cache memory to "write" instructions and some other portions of the cache memory to "read" instructions. For example, a portion of the cache memory may be marked as WRITE and allocated to a data stream having a WRITE attribute, and a portion of the cache memory may be marked as READ and allocated to a data stream having a READ attribute to reduce over-fetching from dynamic random access memory (DRAM, e.g., memory 140). If the incoming memory request includes a WRITE attribute and the cache determines that the WRITE memory portion of the cache memory is dedicated to the requested WRITE attribute, the cache services the request by writing the requested data to the WRITE portion of the cache memory. For example, a WRITE memory portion may be associated with a WRITE attribute value. The cache may compare the page level attribute value of the incoming memory request to the WRITE attribute value to determine whether the WRITE memory portion is dedicated to the incoming memory request. Similarly, if the incoming memory request includes a READ attribute and the cache determines that the READ cache memory portion is dedicated to a request having a READ attribute, the cache may fetch a line from the READ portion of the cache memory and need not forward the request to the memory controller.

In some implementations, the cache may also be refined to use page-level attributes to increase cache usage while also preventing over-fetching of data. For example, some client devices operate on compressed data, and page-level attributes may be used to determine which pages are storing compressed data. For example, the GPU may be reading and writing compressed frame buffer data using frame buffer compression. In these cases, the cache need not allocate a full cache line for reads and writes from the client device. Instead, the cache may allocate only a portion of the cache lines depending on the compression level used. For example, if the compressed data is only half the size of a cache line, e.g., 32 bytes of 64 bytes, the cache may effectively allocate two different addresses in the same cache line. That is, if the page level attribute of the incoming read request indicates that the page stores compressed data, the cache may perform a partial read and store the data in only half of the cache line. In some implementations, the cache manages dirty bits on a complete cache behavior basis. Thus, if any portion of a cache line is modified, the cache may need to perform a write of the complete cache line to memory, even for the unmodified portion.

The cache may also convert the requested page level attribute value to a Partition Identifier (PID) of the cache using a partition identifier based cache policy. Typically, a partition is a portion of a cache resource that is allocated for caching requests for a particular memory region. Typically, there is a correspondence between a partition and a memory region that is used by one or more client devices accessing the memory region.

To perform the translation, the cache may use the page-level attribute value and the identifier of the requesting device to find the partition identifier for servicing the request. This provides a level of indirection between the page level attributes and the actual allocation and replacement policies for the caches of these pages. This feature may be used to partition partitions for different buffers (e.g., different CPU or GPU buffers as described above) based on their page level attributes.

In some implementations, the least significant bits of the partition identifier are replaced with one or more bits from the page-level attribute value. To support this feature, the client device may be restricted to only setting the most significant bits of the partition identifier. For example, if the cache supports 64 memory partitions, client devices that only enumerate 32 partition identifiers may be restricted.

The cache may also use page-level attributes to resolve partition naming conflicts between multiple client devices. If multiple client devices share the same buffer, e.g., in the producer/consumer context mentioned above, each client device may reference the same cache partition by a different identifier. The page level attribute may allow the cache to treat the buffer as a single partition.

To do so, the cache may use special function registers to maintain a mapping between the page level attribute mask and the local Partition Identifiers (PIDs) used only within the cache. Thus, when a request is entered, the cache may convert the page level attributes to a local PID. The cache may then service the request using the local PID instead of the PID supplied by the client device. Thus, when a cache serves requests from two different client devices referencing the same buffer with different PIDs, the cache will map different external PIDs to the same local PID based on the page level attributes of the buffer. The client device may still be responsible for enabling or disabling the partition depending on the execution context of the system using an external PID.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware (including the structures disclosed in this specification and their structural equivalents), or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by data processing apparatus.

The term "data processing apparatus" refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus may also be or further comprise dedicated logic circuitry, such as an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). In addition to hardware, the apparatus may optionally include code that creates an execution environment for the computer program, such as code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (which may also be referred to or described as a program, software application, app, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative languages or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.

A system of one or more computers being configured to perform a particular operation or action means that the system has installed thereon software, firmware, hardware, or a combination thereof that, in operation, causes the system to perform the operation or action. The one or more computer programs being configured to perform a particular operation or action means that the one or more programs include instructions that, when executed by the data processing apparatus, cause the apparatus to perform the operation or action.

As used in this specification, "engine" or "software engine" refers to a hardware-implemented or software-implemented input/output system that provides an output that is different from an input. The engine may be implemented in dedicated digital circuitry or as computer readable instructions to be executed by a computing device. Each engine may be implemented on any suitable type of computing device (e.g., a server, mobile phone, tablet computer, notebook computer, music player, e-book reader, laptop or desktop computer, PDA, smart phone, or other stationary or portable device) that includes one or more processors and computer-readable media. Additionally, two or more engines may be implemented on the same computing device or on different computing devices.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, or in combination with, special purpose logic circuitry (e.g., an FPGA or ASIC) or one or more programmed computers.

A computer suitable for executing a computer program may be based on a general-purpose or special-purpose microprocessor or both or any other kind of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory may be supplemented by, or incorporated in, special purpose logic circuitry. Typically, a computer will also include one or more mass storage devices (e.g., magnetic, magneto-optical, or optical) for storing data, or a computer will be operatively coupled to receive data from the mass storage device or transfer data to the mass storage device or both. However, a computer need not have such a device. Moreover, a computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a Universal Serial Bus (USB) flash drive), to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices); magnetic disks (e.g., internal hard disks or removable disks); magneto-optical disk; CD ROM disks and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a host device having: a display device for displaying information to a user, such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor; and a keyboard and pointing device, such as a mouse, trackball, or presence-sensitive display or other surface, by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic input, speech input, or tactile input. In addition, the computer may send and receive documents to and from the device used by the user; for example, by sending a web page to a web browser on the user's device in response to a request received from the web browser. Moreover, the computer may interact with the user by sending text messages or other forms of messages to a personal device (e.g., a smart phone) running a messaging application and receiving a response message from the user in return.

In addition to the embodiments described above, the following embodiments are also innovative:

while this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Specific embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

Claims

1. A system for allocating cache resources, comprising:

one or more integrated client devices, each client device configured to generate memory requests, each memory request having a respective physical address and a respective page-level attribute of a respective page of a first memory to which the physical address belongs; and

a cache configured to cache memory requests to the first memory for each of the one or more integrated client devices,

wherein the cache includes a cache memory having a plurality of ways, the cache memory being separate from the first memory, and

wherein the cache is configured to distinguish between different memory requests using page level attributes of physical pages in the first memory of the memory requests and to allocate different portions of the cache memory to different respective memory requests.

2. The system of claim 1, wherein each client device has a respective address translation module configured to translate virtual addresses received from the software driver into respective memory requests having respective physical addresses and respective page descriptors, and wherein each client device is configured to update the page table to assign particular page level attribute values to particular pages.

3. The system of claim 2, wherein the cache is configured to distinguish between different memory requests using a particular page-level attribute of a page descriptor generated by the corresponding address translation module of the client device.

4. The system of claim 3, wherein the address translation module is a memory management unit configured to perform a hardware walk of a page table in the first memory to perform address translation.

5. The system of claim 1, wherein the cache is configured to identify physical addresses occurring on different pages as part of the same memory request.

6. The system of claim 1, wherein a first page-level attribute is included in a memory request that is part of a first instruction stream executed by a client device and a second page-level attribute, different from the first page-level attribute, is included in a memory request that is part of a second data stream used by the client device, and

Wherein the cache is configured to distinguish the first instruction stream from the second data stream based on the first page level attribute and the second page level attribute to allocate a first portion of the cache memory to the first instruction stream and a second portion of the cache memory to the second data stream.

7. The system of claim 1, wherein a first page level attribute is included in a memory request that is part of a first instruction stream executed by a first client device and a second page level attribute different from the first page level attribute is included in a memory request that is part of a second instruction stream executed by the first client device or a different second client device, and

wherein the cache is configured to distinguish the first instruction stream from the second instruction stream based on the first page level attribute and the second page level attribute to allocate a first portion of the cache memory to the first instruction stream and a second portion of the cache memory to the second instruction stream.

8. The system of claim 1, wherein a first page level attribute is included in a memory request that is part of a first data stream written to a first data buffer, and a second page level attribute, different from the first page level attribute, is included in a memory request that is part of a second data stream written to a second data buffer, and

Wherein the cache is configured to distinguish the first data stream from the second data stream based on the first page level attribute and the second page level attribute to allocate a first portion of the cache memory to the first data stream and a second portion of the cache memory to the second data stream based on respective first page level attributes and respective second page level attributes included in the first data stream and the second data stream.

9. The system of claim 8, wherein the cache is configured to use the page level attribute to allocate more cache memory to a data buffer storing page table data than to a data buffer storing non-page table data.

10. The system of claim 1, wherein the cache is configured to assign different replacement policies to different memory requests based on the respective page-level attributes of the memory requests.

11. The system of claim 10, wherein the cache is configured to identify a first data stream written by a producer process executing on one of the one or more client devices, wherein the first data stream written by the producer process is consumed by a consumer process executing on one of the one or more client devices,

Wherein the cache is configured to allocate a first portion of the cache memory to the first data stream written by the producer process.

12. The system of claim 11, wherein the cache is configured to invalidate a cache entry in the first portion of cache memory whenever the cache entry is read by the consumer process.

13. The system of claim 1, wherein the cache is configured to:

determining that the page-level attribute of the read request indicates that the particular page uses compressed data, an

In response, less than all of the complete cache lines are read from memory to fulfill the read request.

14. The system of claim 1, wherein the cache is configured to map page-level attribute values to particular partition identifiers, thereby associating the particular partition identifiers with a plurality of different pages in the first memory.

15. A computer-implemented method performed by a computing system, the method comprising:

receiving, by the system, one or more memory requests generated by respective ones of the one or more client devices, each memory request having a respective physical address and a respective page-level attribute of a respective page of a first memory to which the physical address belongs; and

Caching, by a cache of the system, the memory request to the first memory in a cache memory by:

distinguishing between different memory requests using page level attributes of physical pages in the first memory of the memory requests, an

Different portions of the cache memory are allocated to different respective memory requests.

16. The method of claim 15, wherein the first page level attribute is included in a memory request that is part of a first instruction stream executed by a client device and the second page level attribute is included in a memory request that is part of a second data stream used by the client device,

wherein distinguishing between different memory requests includes distinguishing the first instruction stream from the second data stream for allocation based on the first page level attribute and the second page level attribute, and

wherein a first portion of the cache memory is allocated to the first instruction stream and a second portion of the cache memory is allocated to the second data stream.

17. The method of claim 15, wherein a first page level attribute is included in a memory request from a first instruction stream to be executed by a first client device and a second page level attribute different from the first page level attribute is included in a memory request that is a part of a second instruction stream to be executed by the first client device or a second, different client device,

Wherein distinguishing between different memory requests includes distinguishing the first instruction stream from the second instruction stream for allocation based on the first page level attribute and the second page level attribute, and

wherein a first portion of the cache memory is allocated to the first instruction stream and a second portion of the cache memory is allocated to the second instruction stream.

18. The method of claim 15, wherein the first page level attribute is included in a memory request that is part of a first data stream written to the first data buffer and the second page level attribute is included in a memory request that is part of a second data stream written to the second data buffer,

wherein distinguishing between different memory requests includes distinguishing the first data stream from the second data stream for allocation based on the first page level attribute and the second page level attribute, and

wherein a first portion of the cache memory is allocated to the first data stream and a second portion of the cache memory is allocated to the second data stream.

19. The method of claim 15, further comprising: identifying a first data stream written by a producer process executing on one of the one or more client devices, wherein the first data stream written by the producer process is consumed by a consumer process executing on one of the one or more client devices,

20. The method of claim 15, further comprising: