US20170249249A1 - System and method for improved memory performance using cache level hashing - Google Patents
System and method for improved memory performance using cache level hashing Download PDFInfo
- Publication number
- US20170249249A1 US20170249249A1 US15/054,295 US201615054295A US2017249249A1 US 20170249249 A1 US20170249249 A1 US 20170249249A1 US 201615054295 A US201615054295 A US 201615054295A US 2017249249 A1 US2017249249 A1 US 2017249249A1
- Authority
- US
- United States
- Prior art keywords
- hashing
- cache
- clients
- memory device
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0804—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0868—Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1024—Latency reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
- G06F2212/1044—Space efficiency improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/283—Plural cache memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/313—In storage device
Definitions
- Portable computing devices commonly contain integrated circuits, or systems on a chip (“SoC”), that include numerous components designed to work together to deliver functionality to a user.
- SoC may contain any number of master components such as modems, displays, central processing units (“CPUs”), graphical processing units (“GPUs”), etc. that are used by application clients to process workloads.
- the master components read and/or write data and/or instructions to and/or from memory components on the SoC.
- the data and instructions may be generally termed “transactions” and are transmitted between the devices via a collection of wires known as a bus.
- master components make use of closely coupled memory devices whenever possible, such as level one (“L 1 ”) and level two (“L 2 ”) cache devices, because cache devices provide a dedicated means to quickly handle read and write transactions for a given master component. If a read transaction cannot be handled by a cache, the event is termed a “cache miss” and the transaction is forwarded to a memory controller that manages access to a slower, but higher capacity long term memory device, such as a double data rate (“DDR”) memory device. Similarly, the data stored in a cache by write transactions from a master component must periodically be flushed to the memory controller for updating and long term storage in the DDR. Notably, while each master component may have the benefit of a dedicated cache memory, the DDR memory device is shared by all master components.
- L 1 level one
- L 2 level two cache devices
- master components may employ index hashing techniques to optimize the use of the inherently limited cache capacity. Even the most efficiently used cache, however, cannot accommodate all the transactions all the time.
- the master component works with the memory controller to fulfill the transactions at the DDR.
- cache miss transaction streams and write transaction streams emanating concurrently from a master component's last level cache can result in page conflicts at the DDR as both streams compete to access the same DDR memory bank.
- Page conflicts increase transaction latency, unnecessarily consume power resources, and reduce bus bandwidth availability.
- memory controllers known in the art often employ a “one size fits all” DDR bank hashing technique to increase the probability that concurrent read and write transaction streams emanating from a given master component may be accommodated simultaneously from different banks within the DDR.
- a shortcoming of a DDR bank hashing technique is that a single hashing algorithm is usually not optimal, or even desirable, for all master components seeking access to the DDR. And so, prior art solutions often rely on a validation procedure during product develop to determine which single hashing technique works best, even though not optimally, for all master components concerned. Notably, because a single hashing technique will inevitably not be optimal for all master components, prior art solutions are prone to unacceptable transaction latencies and less than optimal DDR memory utilization.
- SoC system on a chip
- eligibility for hashing the transaction traffic of one or more application clients is determined.
- a customized hashing algorithm is selected and applied via a cache index hash module to read and write transactions associated with a low level cache.
- the transactions, having been hashed at the cache level, are directed from the low level cache to a memory controller associated with a multi-bank memory device accessible by a plurality of clients (as opposed to the cache which is accessible by only the one client).
- the hashed read and write transactions are fulfilled from different banks of the multi-bank memory device, such as different banks of a DDR memory device.
- the most optimal hash algorithm for each client may be used for transactions emanating from that client.
- those clients that do not require or benefit from hashing their transaction traffic are not affected by a cache index hash module, the need for validating the source of the transactions is avoided.
- FIG. 1 is a functional block diagram illustrating an exemplary, non-limiting aspect of a portable computing device (“PCD”) in the form of a wireless telephone for implementing cache-level memory management (“CMM”) systems and methods;
- PCD portable computing device
- CCM cache-level memory management
- FIGS. 2A-2B are illustrations of read and write stream flows affecting DDR memory utilization
- FIG. 3 is a functional block diagram illustrating a prior art system for memory management using a DDR bank hashing module
- FIG. 4 is a functional block diagram illustrating an exemplary embodiment of an on-chip system for cache-level memory management (“CMM”) of a memory subsystem
- FIG. 5 is a logical flowchart illustrating an exemplary method for cache-level memory management (“CMM”) according to the solution.
- CCM cache-level memory management
- an “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches.
- an “application” referred to herein may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
- DDR double data rate
- RAM volatile random access memory
- a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a computing device and the computing device may be a component.
- One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers.
- these components may execute from various computer readable media having various data structures stored thereon.
- the components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
- CPU central processing unit
- DSP digital signal processor
- GPU graphical processing unit
- chips are used interchangeably.
- a CPU, DSP, GPU or a chip may be comprised of one or more distinct processing components generally referred to herein as “core(s).”
- a master component may refer to, but is not limited to refer to, a CPU, DSP, GPU, modem, controller, display, camera, etc.
- a master component comprised within an embodiment of the solution may leverage a customized cache-level hashing technique or, alternatively, may not. For a given master component that does not leverage a customized cache-level hashing technique, it is envisioned that it may comprise dedicated hardware to generate a special transaction traffic pattern particularly suited for that given master component.
- writeback and “flush” refer to the process of updating data and/or instructions instantiated in a DDR based on fresher versions of the data and/or instructions that exist in a closely coupled memory (e.g., an L 2 cache) associated with one or more master components.
- a closely coupled memory e.g., an L 2 cache
- data instantiated in a closely coupled memory to a processing component such as a low level L 2 cache memory
- the DDR memory address may be associated with a certain data bank in the DDR for storing data in either a compressed or decompressed format, as would be understood by one of ordinary skill in the art.
- a memory controller may seek to update DDR, as would be understood by one of ordinary skill in the art.
- the term “dirty bit” will be understood to be a bit associated with a virtual memory page in a cache that indicates that the data stored in the memory page has been generated anew or modified from its original state by a master component, but not yet written back to DDR.
- a writeback transaction to the DDR seeks to update a memory address in the same bank as another memory address which is the target of a read transaction, a page conflict may occur.
- embodiments of the solution leverage cache-level hashing techniques to store data in virtual memory addresses that point to DDR memory addresses in different banks.
- bus refers to a collection of wires through which data is transmitted from a processing engine to a memory component or other device located on or off the SoC. It will be understood that a bus consists of two parts—an address bus and a data bus where the data bus transfers actual data and the address bus transfers information specifying location of the data in a memory component.
- the term “width” or “bus width” or “bandwidth” refers to an amount of data, i.e. a “chunk size,” that may be transmitted per cycle through a given bus. For example, a 16-byte bus may transmit 16 bytes of data at a time, whereas 32-byte bus may transmit 32 bytes of data per cycle.
- bus speed refers to the number of times a chunk of data may be transmitted through a given bus each second.
- a “bus cycle” or “cycle” refers to transmission of one chunk of data through a given bus.
- PCD portable computing device
- 3G third generation
- 4G fourth generation
- a PCD may be a cellular telephone, a satellite telephone, a pager, a PDA, a smartphone, a navigation device, a smartbook or reader, a media player, a combination of the aforementioned devices, a laptop computer with a wireless connection, among others.
- master processing components in a shared memory multiprocessor system use the memory subsystem to exchange information and perform synchronization. Consequently, memory contention (i.e., page conflicts) associated with multiple read and write transactions seeking simultaneous access to a common bank of the shared memory subsystem may cause QoS to suffer.
- memory contention i.e., page conflicts
- Prior art solutions combat memory contention by employing a hashing block at the memory controller in an effort to disperse data storage across multiple banks which may be accessed simultaneously.
- a read transaction and a write transaction emanating from an L 2 cache of a master component may both point to the same DDR bank, but the hashing block of the memory controller works to accommodate both requests from different banks.
- a problem with employing a hashing block at the memory controller is that the hashing algorithm used may not be optimal for all master components that share access to the DDR.
- some master components may include specialized traffic pattern generators, applying the hashing algorithm at the memory controller to their particular transaction traffic may be detrimental to their functionality. Consequently, prior art solutions often employ an “extra step” to validate the master component from which a given transaction has emanated before subjecting the given transaction to the hashing block of the memory controller. Such validation steps contribute to bus congestion, power consumption and increased latencies.
- embodiments of a cache-level memory management (“CMM”) solution avoid hashing and validation at the memory controller by employing dedicated, customized hashing modules at the lower level caches of each master component.
- CCM cache-level memory management
- no dedicated hashing module may be comprised or, alternatively, a hashing module dedicated to the given master component may be bypassed or “turned off.”
- competing transactions emanating from a given master component may be associated with virtual memory addresses in the closely coupled cache that already point to different DDR banks when received by the memory controller.
- FIG. 1 is a functional block diagram illustrating an exemplary, non-limiting aspect of a portable computing device (“PCD”) in the form of a wireless telephone for implementing cache-level memory management (“CMM”) systems and methods.
- the PCD 100 includes an on-chip system 102 that includes a multi-core central processing unit (“CPU”) 110 and an analog signal processor 126 that are coupled together.
- the CPU 110 may comprise a zeroth core 222 , a first core 224 , and an Nth core 230 as understood by one of ordinary skill in the art.
- a digital signal processor (“DSP”) may also be employed as understood by one of ordinary skill in the art.
- the memory subsystem 112 comprises, inter alia, a memory controller 215 , dedicated caches for master components, and a DDR memory 115 (collectively depicted in the FIG. 1 illustration as memory subsystem 112 ).
- the memory subsystem 112 in general, and some of its components specifically, may be formed from hardware and/or firmware and may be responsible for cache-level memory management using component-customized hashing methodologies that work to mitigate page conflicts when accessing data and instructions stored in the DDR memory 115 .
- CMM solutions optimize transaction latencies when data is flushed to, or read from, non-volatile memory as well as minimize data traffic on the bus 205 (not shown in FIG. 1 ) and reduce power consumption on the SoC.
- a display controller 128 and a touch screen controller 130 are coupled to the digital signal processor 110 .
- a touch screen display 132 external to the on-chip system 102 is coupled to the display controller 128 and the touch screen controller 130 .
- PCD 100 may further include a video encoder 134 , e.g., a phase-alternating line (“PAL”) encoder, a sequential 07 Mother memoire (“SECAM”) encoder, a national television system(s) committee (“NTSC”) encoder or any other type of video encoder 134 .
- the video encoder 134 is coupled to the multi-core CPU 110 .
- a video amplifier 136 is coupled to the video encoder 134 and the touch screen display 132 .
- a video port 138 is coupled to the video amplifier 136 .
- a universal serial bus (“USB”) controller 140 is coupled to the CPU 110 .
- a USB port 142 is coupled to the USB controller 140 .
- the memory subsystem 112 which may include a PoP memory, a mask ROM/Boot ROM, a boot OTP memory, a DDR memory 115 (see subsequent Figures), caches and customized hashing modules may also be coupled to the CPU 110 and/or include its own dedicated processor(s).
- a subscriber identity module (“SIM”) card 146 may also be coupled to the CPU 110 .
- a digital camera 148 may be coupled to the CPU 110 .
- the digital camera 148 is a charge-coupled device (“CCD”) camera or a complementary metal-oxide semiconductor (“CMOS”) camera.
- CCD charge-coupled device
- CMOS complementary metal-oxide semiconductor
- a stereo audio CODEC 150 may be coupled to the analog signal processor 126 .
- an audio amplifier 152 may be coupled to the stereo audio CODEC 150 .
- a first stereo speaker 154 and a second stereo speaker 156 are coupled to the audio amplifier 152 .
- FIG. 1 shows that a microphone amplifier 158 may be also coupled to the stereo audio CODEC 150 .
- a microphone 160 may be coupled to the microphone amplifier 158 .
- a frequency modulation (“FM”) radio tuner 162 may be coupled to the stereo audio CODEC 150 .
- an FM antenna 164 is coupled to the FM radio tuner 162 .
- stereo headphones 166 may be coupled to the stereo audio CODEC 150 .
- FM frequency modulation
- FIG. 1 further indicates that a radio frequency (“RF”) transceiver 168 may be coupled to the analog signal processor 126 .
- An RF switch 170 may be coupled to the RF transceiver 168 and an RF antenna 172 .
- a keypad 174 may be coupled to the analog signal processor 126 .
- a mono headset with a microphone 176 may be coupled to the analog signal processor 126 .
- a vibrator device 178 may be coupled to the analog signal processor 126 .
- FIG. 1 also shows that a power supply 188 , for example a battery, is coupled to the on-chip system 102 through a power management integrated circuit (“PMIC”) 180 .
- the power supply 188 includes a rechargeable DC battery or a DC power supply that is derived from an alternating current (“AC”) to DC transformer that is connected to an AC power source.
- AC alternating current
- the CPU 110 may also be coupled to one or more internal, on-chip thermal sensors 157 A as well as one or more external, off-chip thermal sensors 157 B.
- the on-chip thermal sensors 157 A may comprise one or more proportional to absolute temperature (“PTAT”) temperature sensors that are based on vertical PNP structure and are usually dedicated to complementary metal oxide semiconductor (“CMOS”) very large-scale integration (“VLSI”) circuits.
- CMOS complementary metal oxide semiconductor
- VLSI very large-scale integration
- the off-chip thermal sensors 157 B may comprise one or more thermistors.
- the thermal sensors 157 may produce a voltage drop that is converted to digital signals with an analog-to-digital converter (“ADC”) controller (not shown).
- ADC analog-to-digital converter
- other types of thermal sensors 157 may be employed.
- the touch screen display 132 , the video port 138 , the USB port 142 , the camera 148 , the first stereo speaker 154 , the second stereo speaker 156 , the microphone 160 , the FM antenna 164 , the stereo headphones 166 , the RF switch 170 , the RF antenna 172 , the keypad 174 , the mono headset 176 , the vibrator 178 , thermal sensors 157 B, the PMIC 180 and the power supply 188 are external to the on-chip system 102 . It will be understood, however, that one or more of these devices depicted as external to the on-chip system 102 in the exemplary embodiment of a PCD 100 in FIG. 1 may reside on chip 102 in other exemplary embodiments.
- one or more of the method steps described herein may be implemented by executable instructions and parameters stored in the memory subsystem 112 or as form the memory controller 215 , the cache(s) and/or the cache index hash module(s) 202 (see FIG. 4 ). Further, the memory controller 215 , the cache(s) and/or the cache index hash module(s) 202 , the instructions stored therein, or a combination thereof may serve as a means for performing one or more of the method steps described herein.
- FIG. 2 is an illustration of read and write stream flows affecting DDR memory utilization.
- the FIG. 2A illustration specifically, depicts a pair of read and write transaction streams emanating from a cache associated with a master component, such as a CPU core, for example.
- the virtual memory addresses in the cache point to DDR memory addresses located in a Data Bank A and, as such, both transaction streams are competing for access to the same data bank.
- the DDR page conflict may cause unacceptable increases in transaction latencies, unnecessary power consumption, bus congestion, etc.
- FIG. 2B illustration depicts the same pair of read and write transaction streams emanating from a cache associated with a master component.
- the memory addresses associated with the read and write streams have been hashed such that the streams are directed to two different banks of the DDR, the read stream directed to Data Bank A and the write stream directed to Data Bank B. In this way, the transactions of both streams may be simultaneously serviced.
- prior art solutions try to accomplish such hashing by employing a hash module at the memory controller, a method which often requires a validation step to make sure that each stream is eligible for hashing.
- Embodiments of a cache-level memory management solution employ hashing functions at the cache-level so that each master component benefits from a hash methodology that is optimized for its particular needs. Also, because in a CMM solution the transaction streams emanating from the low level caches of master components are already hashed upon arrival at the memory controller, a validation step is unnecessary.
- the use of customized hashing algorithms at the cache level in order to dictate long term storage addresses in the DDR enables embodiments of a CMM solution to avoid DDR page conflicts which may result in relatively lower transaction latencies when compared to prior art approaches.
- FIG. 3 is a functional block diagram illustrating a prior art system for memory management using a DDR bank hashing module 216 .
- the master components 201 are processing workloads according to the demands of various application clients and, in doing so, are issuing read and write transactions to the DDR 115 via bus 205 .
- the transactions are marshaled by memory controller 215 . Because the read and write transaction streams may be directed to memory addresses in the DDR 115 which are located in a common bank, the memory controller 215 utilizes DDR bank hashing module 216 to employ a single hashing algorithm, as would be understood by one of ordinary skill in the art.
- the purpose of the hashing step by the DDR bank hashing module 216 is to point the various transactions to memory addresses residing in different banks (e.g., data bank A and data bank B) of the DDR 115 to minimize page conflicts, but the nature of using one hashing algorithm for all transaction streams without regard for the master component from which a given stream was issued is that the hashing algorithm will inevitably be non-optimal for one or more of the master components.
- validation of those streams must be taken into consideration during development so that hashing is bypassed.
- This step may come at a high cost to transaction latency, as the additional time to identify the validated transaction prior to hashing impacts the overall time required to fulfill the transaction.
- the validation step may also come at a high cost to power consumption and bus congestion.
- FIG. 4 is a functional block diagram illustrating an exemplary embodiment of an on-chip system 102 for cache-level memory management (“CMM”) of a memory subsystem 112 .
- CMS cache-level memory management
- master components 203 In carrying out various workloads, master components 203 generate transaction requests for either updating or returning data and instructions stored in DDR 115 . As would be understood by one of ordinary skill in the art, the transaction requests are directed over a bus 205 to a memory controller 215 that marshals the requests and manages the DDR 115 image.
- the master components 203 may comprise a cache index hash module 202 (as illustrated with master components 203 A, 203 B) that works to hash the DDR memory locations for transaction streams emanating from the respective L 2 caches. Because the cache index hashing modules 202 A, 202 B are respectively associated with given master components 203 A, 203 B, the particular hashing algorithms employed by the modules 202 A, 202 B may be customized and tuned for the particular needs and functionality of the associated master component 203 . In this way, a CCM embodiment provides for the selection and provision of multiple hashing algorithms, each optimized for the transaction stream from the particular master component to which it is applied.
- embodiments of a CCM solution provide for the ability to simply “turn off,” or altogether decline the inclusion of, a cache index hash module (as illustrated with mater component 203 n ).
- hashing algorithms that may be employed by a given cache index hashing module 202
- those of ordinary skill in the art may select hashing algorithms best suited for individual master components.
- a CCM embodiment is not limited to the use of any particular hashing algorithm or hardware arrangement for employing a hashing algorithm.
- memory addresses may be hashed using a XOR gate, as would be understood by one of ordinary skill in the art. Assume, for example, that DDR bank bits are addressed as [15:13] and that a 1 MB L 2 cache associated with a given master component includes a 64B line size.
- a cache index hash module 202 may apply logic represented as:
- DDR memory 115 is depicted to comprise two memory regions, a Data Bank A and a Data Bank B.
- the memory controller 215 which may comprise its own processor, receives the transaction requests generated by the master components 201 and already hashed using via the cache index hash module(s) 202 . Because the transaction requests arrive at the memory controller already hashed, validation of the transaction source prior to hashing may be alleviated in embodiments of a CMM solution.
- CMM solutions may relieve congestion on the bus 205 , improve average transaction latency, optimize memory capacity utilization and minimize power consumption across the SoC 102 .
- each application client in a CMM embodiment benefits from its master components having a dedicated, customized cache index hashing module 202 which may be tuned based on the master component's unique traffic pattern and cache organization. In this way, a CMM embodiment may deliver an overall memory performance that is optimized. Additionally, CMM embodiments provide an option for certain master components to opt out of hashing, such as those application clients and master components that employ dedicated hardware to generate specific traffic patterns that should not be changed.
- FIG. 5 is a logical flowchart illustrating an exemplary method 500 for cache-level memory management (“CMM”) according to an embodiment of the solution.
- CMS cache-level memory management
- a cache index hash module 202 is customized with a hashing algorithm that is optimal for the given client and/or master component.
- the customized hashing algorithm is applied to the memory addresses identified in the various transaction streams such that, when forwarded to a memory controller at block 520 , the DDR memory addresses identified in the transactions are likely to reside in differing DDR banks.
- embodiments of CMM solutions minimize page conflicts at the DDR, optimize cache usage for each given client, reduce transaction latencies, and avoid unnecessary validation steps which consume power and contribute to bus congestion.
- the method 500 returns.
- the functions described may be implemented in hardware, software, firmware, or any combination thereof If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable device.
- Computer-readable devices include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
- a storage media may be any available media that may be accessed by a computer.
- such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- Portable computing devices (“PCDs”) commonly contain integrated circuits, or systems on a chip (“SoC”), that include numerous components designed to work together to deliver functionality to a user. For example, a SoC may contain any number of master components such as modems, displays, central processing units (“CPUs”), graphical processing units (“GPUs”), etc. that are used by application clients to process workloads. In processing the workloads, the master components read and/or write data and/or instructions to and/or from memory components on the SoC. The data and instructions may be generally termed “transactions” and are transmitted between the devices via a collection of wires known as a bus.
- As would be understood by one of ordinary skill in the art, master components make use of closely coupled memory devices whenever possible, such as level one (“L1”) and level two (“L2”) cache devices, because cache devices provide a dedicated means to quickly handle read and write transactions for a given master component. If a read transaction cannot be handled by a cache, the event is termed a “cache miss” and the transaction is forwarded to a memory controller that manages access to a slower, but higher capacity long term memory device, such as a double data rate (“DDR”) memory device. Similarly, the data stored in a cache by write transactions from a master component must periodically be flushed to the memory controller for updating and long term storage in the DDR. Notably, while each master component may have the benefit of a dedicated cache memory, the DDR memory device is shared by all master components.
- Transactions are most quickly serviced by cache, and so master components may employ index hashing techniques to optimize the use of the inherently limited cache capacity. Even the most efficiently used cache, however, cannot accommodate all the transactions all the time. When the cache cannot service a read transaction, or when data written in the cache is flushed to the DDR to make room in the cache for new data, the master component works with the memory controller to fulfill the transactions at the DDR.
- Notably, cache miss transaction streams and write transaction streams emanating concurrently from a master component's last level cache can result in page conflicts at the DDR as both streams compete to access the same DDR memory bank. Page conflicts increase transaction latency, unnecessarily consume power resources, and reduce bus bandwidth availability. To minimize page conflicts, memory controllers known in the art often employ a “one size fits all” DDR bank hashing technique to increase the probability that concurrent read and write transaction streams emanating from a given master component may be accommodated simultaneously from different banks within the DDR.
- A shortcoming of a DDR bank hashing technique is that a single hashing algorithm is usually not optimal, or even desirable, for all master components seeking access to the DDR. And so, prior art solutions often rely on a validation procedure during product develop to determine which single hashing technique works best, even though not optimally, for all master components concerned. Notably, because a single hashing technique will inevitably not be optimal for all master components, prior art solutions are prone to unacceptable transaction latencies and less than optimal DDR memory utilization.
- Therefore there is a need in the art for a system and method that optimizes DDR memory utilization through the use of multiple cache level hashing techniques that are optimal for associated master components. Moreover, there is a need in the art for a system and method that optimizes transaction latencies by alleviating the need to use a single, one-size fits all DDR level bank hashing scheme.
- Various embodiments of methods and systems for cache-level memory management in a system on a chip (“SoC”) are disclosed. In an exemplary embodiment, eligibility for hashing the transaction traffic of one or more application clients is determined. For each of the clients that is determined to be eligible for hashing transaction traffic, a customized hashing algorithm is selected and applied via a cache index hash module to read and write transactions associated with a low level cache. The transactions, having been hashed at the cache level, are directed from the low level cache to a memory controller associated with a multi-bank memory device accessible by a plurality of clients (as opposed to the cache which is accessible by only the one client). Advantageously, the hashed read and write transactions are fulfilled from different banks of the multi-bank memory device, such as different banks of a DDR memory device. In this way, the most optimal hash algorithm for each client may be used for transactions emanating from that client. And, because those clients that do not require or benefit from hashing their transaction traffic are not affected by a cache index hash module, the need for validating the source of the transactions is avoided.
- In the drawings, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all figures.
-
FIG. 1 is a functional block diagram illustrating an exemplary, non-limiting aspect of a portable computing device (“PCD”) in the form of a wireless telephone for implementing cache-level memory management (“CMM”) systems and methods; -
FIGS. 2A-2B are illustrations of read and write stream flows affecting DDR memory utilization; -
FIG. 3 is a functional block diagram illustrating a prior art system for memory management using a DDR bank hashing module; -
FIG. 4 is a functional block diagram illustrating an exemplary embodiment of an on-chip system for cache-level memory management (“CMM”) of a memory subsystem; and -
FIG. 5 is a logical flowchart illustrating an exemplary method for cache-level memory management (“CMM”) according to the solution. - The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect described herein as “exemplary” is not necessarily to be construed as exclusive, preferred or advantageous over other aspects.
- In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
- In this description, reference to double data rate “DDR” memory components will be understood to envision any of a broader class of volatile random access memory (“RAM”) used for long term data storage and will not limit the scope of the solutions disclosed herein to a specific type or generation of RAM.
- As used in this description, the terms “component,” “database,” “module,” “system,” “controller,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
- In this description, the terms “central processing unit (“CPU”),” “digital signal processor (“DSP”),” “graphical processing unit (“GPU”),” and “chip” are used interchangeably. Moreover, a CPU, DSP, GPU or a chip may be comprised of one or more distinct processing components generally referred to herein as “core(s).”
- In this description, the terms “engine,” “processing engine,” “master processing engine,” “master component” and the like are used to refer to any component within a system on a chip (“SoC”) that generates transaction requests to closely coupled memory devices and/or to components of a memory subsystem via a bus. As such, a master component may refer to, but is not limited to refer to, a CPU, DSP, GPU, modem, controller, display, camera, etc. A master component comprised within an embodiment of the solution, depending on its particular function and needs, may leverage a customized cache-level hashing technique or, alternatively, may not. For a given master component that does not leverage a customized cache-level hashing technique, it is envisioned that it may comprise dedicated hardware to generate a special transaction traffic pattern particularly suited for that given master component.
- In this description, the terms “writeback” and “flush” refer to the process of updating data and/or instructions instantiated in a DDR based on fresher versions of the data and/or instructions that exist in a closely coupled memory (e.g., an L2 cache) associated with one or more master components. One of ordinary skill in the art will understand that data instantiated in a closely coupled memory to a processing component, such a low level L2 cache memory, may have a virtual memory address associated with a memory address in DDR. The DDR memory address may be associated with a certain data bank in the DDR for storing data in either a compressed or decompressed format, as would be understood by one of ordinary skill in the art. Based on the virtual memory address and the presence of a “dirty” bit for data stored in a cache, a memory controller may seek to update DDR, as would be understood by one of ordinary skill in the art. The term “dirty bit” will be understood to be a bit associated with a virtual memory page in a cache that indicates that the data stored in the memory page has been generated anew or modified from its original state by a master component, but not yet written back to DDR. When a writeback transaction to the DDR seeks to update a memory address in the same bank as another memory address which is the target of a read transaction, a page conflict may occur. To mitigate page conflicts, embodiments of the solution leverage cache-level hashing techniques to store data in virtual memory addresses that point to DDR memory addresses in different banks.
- In this description, the term “bus” refers to a collection of wires through which data is transmitted from a processing engine to a memory component or other device located on or off the SoC. It will be understood that a bus consists of two parts—an address bus and a data bus where the data bus transfers actual data and the address bus transfers information specifying location of the data in a memory component. The term “width” or “bus width” or “bandwidth” refers to an amount of data, i.e. a “chunk size,” that may be transmitted per cycle through a given bus. For example, a 16-byte bus may transmit 16 bytes of data at a time, whereas 32-byte bus may transmit 32 bytes of data per cycle. Moreover, “bus speed” refers to the number of times a chunk of data may be transmitted through a given bus each second. Similarly, a “bus cycle” or “cycle” refers to transmission of one chunk of data through a given bus.
- In this description, the term “portable computing device” (“PCD”) is used to describe any device operating on a limited capacity power supply, such as a battery. Although battery operated PCDs have been in use for decades, technological advances in rechargeable batteries coupled with the advent of third generation (“3G”) and fourth generation (“4G”) wireless technology have enabled numerous PCDs with multiple capabilities. Therefore, a PCD may be a cellular telephone, a satellite telephone, a pager, a PDA, a smartphone, a navigation device, a smartbook or reader, a media player, a combination of the aforementioned devices, a laptop computer with a wireless connection, among others.
- In current systems and methods, master components running simultaneously in a PCD create an intermingled flow of read and write transaction requests that necessitate access to dispersed regions of a DDR memory component. Each transaction consumes power and bus bandwidth as compressed and decompressed data are transmitted over a bus and marshaled by a memory controller to and from a DDR component. Consequently, queues of transaction requests seeking to access data in shared regions of the DDR may not only consume unnecessary amounts of power, but also create memory contentions and bus traffic congestion that work to detrimentally increase transaction latencies. Similarly, and as one of ordinary skill in the art would understand, the quality of service (“QoS”) experienced by a user of a PCD may suffer when excessive amounts of bandwidth and power capacity are utilized to service transaction requests bound to a DDR.
- As one of ordinary skill in the art would understand, master processing components in a shared memory multiprocessor system use the memory subsystem to exchange information and perform synchronization. Consequently, memory contention (i.e., page conflicts) associated with multiple read and write transactions seeking simultaneous access to a common bank of the shared memory subsystem may cause QoS to suffer.
- Prior art solutions combat memory contention by employing a hashing block at the memory controller in an effort to disperse data storage across multiple banks which may be accessed simultaneously. A read transaction and a write transaction emanating from an L2 cache of a master component may both point to the same DDR bank, but the hashing block of the memory controller works to accommodate both requests from different banks. A problem with employing a hashing block at the memory controller, however, is that the hashing algorithm used may not be optimal for all master components that share access to the DDR. Moreover, because some master components may include specialized traffic pattern generators, applying the hashing algorithm at the memory controller to their particular transaction traffic may be detrimental to their functionality. Consequently, prior art solutions often employ an “extra step” to validate the master component from which a given transaction has emanated before subjecting the given transaction to the hashing block of the memory controller. Such validation steps contribute to bus congestion, power consumption and increased latencies.
- Advantageously, embodiments of a cache-level memory management (“CMM”) solution avoid hashing and validation at the memory controller by employing dedicated, customized hashing modules at the lower level caches of each master component. For those master components that do not benefit from hashing, no dedicated hashing module may be comprised or, alternatively, a hashing module dedicated to the given master component may be bypassed or “turned off.” In this way, competing transactions emanating from a given master component may be associated with virtual memory addresses in the closely coupled cache that already point to different DDR banks when received by the memory controller.
-
FIG. 1 is a functional block diagram illustrating an exemplary, non-limiting aspect of a portable computing device (“PCD”) in the form of a wireless telephone for implementing cache-level memory management (“CMM”) systems and methods. As shown, thePCD 100 includes an on-chip system 102 that includes a multi-core central processing unit (“CPU”) 110 and ananalog signal processor 126 that are coupled together. TheCPU 110 may comprise azeroth core 222, afirst core 224, and anNth core 230 as understood by one of ordinary skill in the art. Further, instead of aCPU 110, a digital signal processor (“DSP”) may also be employed as understood by one of ordinary skill in the art. - In general, the
memory subsystem 112 comprises, inter alia, amemory controller 215, dedicated caches for master components, and a DDR memory 115 (collectively depicted in theFIG. 1 illustration as memory subsystem 112). Thememory subsystem 112 in general, and some of its components specifically, may be formed from hardware and/or firmware and may be responsible for cache-level memory management using component-customized hashing methodologies that work to mitigate page conflicts when accessing data and instructions stored in theDDR memory 115. Advantageously, by performing cache-level hashing in thememory subsystem 112, CMM solutions optimize transaction latencies when data is flushed to, or read from, non-volatile memory as well as minimize data traffic on the bus 205 (not shown inFIG. 1 ) and reduce power consumption on the SoC. - As illustrated in
FIG. 1 , adisplay controller 128 and atouch screen controller 130 are coupled to thedigital signal processor 110. Atouch screen display 132 external to the on-chip system 102 is coupled to thedisplay controller 128 and thetouch screen controller 130.PCD 100 may further include avideo encoder 134, e.g., a phase-alternating line (“PAL”) encoder, a sequential couleur avec memoire (“SECAM”) encoder, a national television system(s) committee (“NTSC”) encoder or any other type ofvideo encoder 134. Thevideo encoder 134 is coupled to themulti-core CPU 110. Avideo amplifier 136 is coupled to thevideo encoder 134 and thetouch screen display 132. Avideo port 138 is coupled to thevideo amplifier 136. - As depicted in
FIG. 1 , a universal serial bus (“USB”)controller 140 is coupled to theCPU 110. Also, aUSB port 142 is coupled to theUSB controller 140. Thememory subsystem 112, which may include a PoP memory, a mask ROM/Boot ROM, a boot OTP memory, a DDR memory 115 (see subsequent Figures), caches and customized hashing modules may also be coupled to theCPU 110 and/or include its own dedicated processor(s). A subscriber identity module (“SIM”)card 146 may also be coupled to theCPU 110. Further, as shown inFIG. 1 , adigital camera 148 may be coupled to theCPU 110. In an exemplary aspect, thedigital camera 148 is a charge-coupled device (“CCD”) camera or a complementary metal-oxide semiconductor (“CMOS”) camera. - As further illustrated in
FIG. 1 , astereo audio CODEC 150 may be coupled to theanalog signal processor 126. Moreover, anaudio amplifier 152 may be coupled to thestereo audio CODEC 150. In an exemplary aspect, afirst stereo speaker 154 and asecond stereo speaker 156 are coupled to theaudio amplifier 152.FIG. 1 shows that amicrophone amplifier 158 may be also coupled to thestereo audio CODEC 150. Additionally, amicrophone 160 may be coupled to themicrophone amplifier 158. In a particular aspect, a frequency modulation (“FM”)radio tuner 162 may be coupled to thestereo audio CODEC 150. Also, anFM antenna 164 is coupled to theFM radio tuner 162. Further,stereo headphones 166 may be coupled to thestereo audio CODEC 150. -
FIG. 1 further indicates that a radio frequency (“RF”)transceiver 168 may be coupled to theanalog signal processor 126. AnRF switch 170 may be coupled to theRF transceiver 168 and anRF antenna 172. As shown inFIG. 1 , akeypad 174 may be coupled to theanalog signal processor 126. Also, a mono headset with amicrophone 176 may be coupled to theanalog signal processor 126. Further, avibrator device 178 may be coupled to theanalog signal processor 126.FIG. 1 also shows that apower supply 188, for example a battery, is coupled to the on-chip system 102 through a power management integrated circuit (“PMIC”) 180. In a particular aspect, thepower supply 188 includes a rechargeable DC battery or a DC power supply that is derived from an alternating current (“AC”) to DC transformer that is connected to an AC power source. - The
CPU 110 may also be coupled to one or more internal, on-chipthermal sensors 157A as well as one or more external, off-chipthermal sensors 157B. The on-chipthermal sensors 157A may comprise one or more proportional to absolute temperature (“PTAT”) temperature sensors that are based on vertical PNP structure and are usually dedicated to complementary metal oxide semiconductor (“CMOS”) very large-scale integration (“VLSI”) circuits. The off-chipthermal sensors 157B may comprise one or more thermistors. The thermal sensors 157 may produce a voltage drop that is converted to digital signals with an analog-to-digital converter (“ADC”) controller (not shown). However, other types of thermal sensors 157 may be employed. - The
touch screen display 132, thevideo port 138, theUSB port 142, thecamera 148, thefirst stereo speaker 154, thesecond stereo speaker 156, themicrophone 160, theFM antenna 164, thestereo headphones 166, theRF switch 170, theRF antenna 172, thekeypad 174, themono headset 176, thevibrator 178,thermal sensors 157B, thePMIC 180 and thepower supply 188 are external to the on-chip system 102. It will be understood, however, that one or more of these devices depicted as external to the on-chip system 102 in the exemplary embodiment of aPCD 100 inFIG. 1 may reside onchip 102 in other exemplary embodiments. - In a particular aspect, one or more of the method steps described herein may be implemented by executable instructions and parameters stored in the
memory subsystem 112 or as form thememory controller 215, the cache(s) and/or the cache index hash module(s) 202 (seeFIG. 4 ). Further, thememory controller 215, the cache(s) and/or the cache index hash module(s) 202, the instructions stored therein, or a combination thereof may serve as a means for performing one or more of the method steps described herein. -
FIG. 2 is an illustration of read and write stream flows affecting DDR memory utilization. TheFIG. 2A illustration, specifically, depicts a pair of read and write transaction streams emanating from a cache associated with a master component, such as a CPU core, for example. The virtual memory addresses in the cache point to DDR memory addresses located in a Data Bank A and, as such, both transaction streams are competing for access to the same data bank. The DDR page conflict may cause unacceptable increases in transaction latencies, unnecessary power consumption, bus congestion, etc. - By contrast, the
FIG. 2B illustration depicts the same pair of read and write transaction streams emanating from a cache associated with a master component. In theFIG. 2B illustration, however, the memory addresses associated with the read and write streams have been hashed such that the streams are directed to two different banks of the DDR, the read stream directed to Data Bank A and the write stream directed to Data Bank B. In this way, the transactions of both streams may be simultaneously serviced. As explained above, prior art solutions try to accomplish such hashing by employing a hash module at the memory controller, a method which often requires a validation step to make sure that each stream is eligible for hashing. Embodiments of a cache-level memory management solution, however, employ hashing functions at the cache-level so that each master component benefits from a hash methodology that is optimized for its particular needs. Also, because in a CMM solution the transaction streams emanating from the low level caches of master components are already hashed upon arrival at the memory controller, a validation step is unnecessary. The use of customized hashing algorithms at the cache level in order to dictate long term storage addresses in the DDR enables embodiments of a CMM solution to avoid DDR page conflicts which may result in relatively lower transaction latencies when compared to prior art approaches. -
FIG. 3 is a functional block diagram illustrating a prior art system for memory management using a DDRbank hashing module 216. In theFIG. 3 illustration, the master components 201 are processing workloads according to the demands of various application clients and, in doing so, are issuing read and write transactions to theDDR 115 via bus 205. - The transactions are marshaled by
memory controller 215. Because the read and write transaction streams may be directed to memory addresses in theDDR 115 which are located in a common bank, thememory controller 215 utilizes DDRbank hashing module 216 to employ a single hashing algorithm, as would be understood by one of ordinary skill in the art. The purpose of the hashing step by the DDRbank hashing module 216 is to point the various transactions to memory addresses residing in different banks (e.g., data bank A and data bank B) of theDDR 115 to minimize page conflicts, but the nature of using one hashing algorithm for all transaction streams without regard for the master component from which a given stream was issued is that the hashing algorithm will inevitably be non-optimal for one or more of the master components. Further, because not all transaction streams are eligible for hashing (such as data streams emanating from a master component comprising hardware for generating a specific traffic pattern), validation of those streams must be taken into consideration during development so that hashing is bypassed. This step may come at a high cost to transaction latency, as the additional time to identify the validated transaction prior to hashing impacts the overall time required to fulfill the transaction. Moreover, the validation step may also come at a high cost to power consumption and bus congestion. -
FIG. 4 is a functional block diagram illustrating an exemplary embodiment of an on-chip system 102 for cache-level memory management (“CMM”) of amemory subsystem 112. In carrying out various workloads, master components 203 generate transaction requests for either updating or returning data and instructions stored inDDR 115. As would be understood by one of ordinary skill in the art, the transaction requests are directed over a bus 205 to amemory controller 215 that marshals the requests and manages theDDR 115 image. - Advantageously, the master components 203 may comprise a cache index hash module 202 (as illustrated with
master components 203A, 203B) that works to hash the DDR memory locations for transaction streams emanating from the respective L2 caches. Because the cacheindex hashing modules master components 203A, 203B, the particular hashing algorithms employed by themodules mater component 203 n). - Regarding hashing algorithms that may be employed by a given cache index hashing module 202, it is envisioned that those of ordinary skill in the art may select hashing algorithms best suited for individual master components. And so, a CCM embodiment is not limited to the use of any particular hashing algorithm or hardware arrangement for employing a hashing algorithm. By way of example, though, and not limitation, it is envisioned that memory addresses may be hashed using a XOR gate, as would be understood by one of ordinary skill in the art. Assume, for example, that DDR bank bits are addressed as [15:13] and that a 1 MB L2 cache associated with a given master component includes a 64B line size. In this non-limiting example, a cache index hash module 202 may apply logic represented as:
- L2_cache_index_addr[15:13]=addr[19:17] XOR addr[15:13]
- Returning to the
FIG. 4 illustration,DDR memory 115 is depicted to comprise two memory regions, a Data Bank A and a Data Bank B. Thememory controller 215, which may comprise its own processor, receives the transaction requests generated by the master components 201 and already hashed using via the cache index hash module(s) 202. Because the transaction requests arrive at the memory controller already hashed, validation of the transaction source prior to hashing may be alleviated in embodiments of a CMM solution. - Advantageously, by “pre-hashing” transaction requests to the
memory subsystem 112 at the cache level, CMM solutions may relieve congestion on the bus 205, improve average transaction latency, optimize memory capacity utilization and minimize power consumption across theSoC 102. Instead of using a single DDR bank hashing methodology, each application client in a CMM embodiment benefits from its master components having a dedicated, customized cache index hashing module 202 which may be tuned based on the master component's unique traffic pattern and cache organization. In this way, a CMM embodiment may deliver an overall memory performance that is optimized. Additionally, CMM embodiments provide an option for certain master components to opt out of hashing, such as those application clients and master components that employ dedicated hardware to generate specific traffic patterns that should not be changed. -
FIG. 5 is a logical flowchart illustrating anexemplary method 500 for cache-level memory management (“CMM”) according to an embodiment of the solution. Beginning atblock 505, for each application client and/or master processing component, the method determines whether the transaction traffic should be hashed. For those components not requiring or benefitting from hashing, the “no” branch is followed fromdecision block 510 to block 525. Atblock 525, any transaction streams associated with those components not requiring or benefitting from hashed memory addresses are transmitted to the memory controller unhashed. The memory controller receives the transactions and fulfills them from the DDR, as would be understood by one of ordinary skill in the art. - Returning to decision block 510, if a client application and/or master processing component is eligible for having the memory addresses identified in its transaction streams hashed, the
method 500 follows the “yes” branch fromdecision block 510 to block 515. Atblock 515, a cache index hash module 202 is customized with a hashing algorithm that is optimal for the given client and/or master component. The customized hashing algorithm is applied to the memory addresses identified in the various transaction streams such that, when forwarded to a memory controller atblock 520, the DDR memory addresses identified in the transactions are likely to reside in differing DDR banks. In this way, embodiments of CMM solutions minimize page conflicts at the DDR, optimize cache usage for each given client, reduce transaction latencies, and avoid unnecessary validation steps which consume power and contribute to bus congestion. Themethod 500 returns. - Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
- Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example. Therefore, disclosure of a particular set of program code instructions or detailed hardware devices or software instruction and data structures is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the drawings, which may illustrate various process flows.
- In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable device. Computer-readable devices include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
- Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
Claims (30)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/054,295 US9747209B1 (en) | 2016-02-26 | 2016-02-26 | System and method for improved memory performance using cache level hashing |
PCT/US2017/015187 WO2017146864A1 (en) | 2016-02-26 | 2017-01-26 | System and method for improved memory performance using cache level hashing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/054,295 US9747209B1 (en) | 2016-02-26 | 2016-02-26 | System and method for improved memory performance using cache level hashing |
Publications (2)
Publication Number | Publication Date |
---|---|
US9747209B1 US9747209B1 (en) | 2017-08-29 |
US20170249249A1 true US20170249249A1 (en) | 2017-08-31 |
Family
ID=58016842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/054,295 Expired - Fee Related US9747209B1 (en) | 2016-02-26 | 2016-02-26 | System and method for improved memory performance using cache level hashing |
Country Status (2)
Country | Link |
---|---|
US (1) | US9747209B1 (en) |
WO (1) | WO2017146864A1 (en) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0821003B2 (en) | 1992-08-07 | 1996-03-04 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Adder / hash circuit for computer cache system |
US6289358B1 (en) | 1998-04-15 | 2001-09-11 | Inktomi Corporation | Delivering alternate versions of objects from an object cache |
US6470442B1 (en) | 1999-07-30 | 2002-10-22 | International Business Machines Corporation | Processor assigning data to hardware partition based on selectable hash of data address |
US7290116B1 (en) | 2004-06-30 | 2007-10-30 | Sun Microsystems, Inc. | Level 2 cache index hashing to avoid hot spots |
US8706966B1 (en) | 2009-12-16 | 2014-04-22 | Applied Micro Circuits Corporation | System and method for adaptively configuring an L2 cache memory mesh |
US20140006538A1 (en) | 2012-06-28 | 2014-01-02 | Bytemobile, Inc. | Intelligent Client-Side Caching On Mobile Devices |
US20150199134A1 (en) | 2014-01-10 | 2015-07-16 | Qualcomm Incorporated | System and method for resolving dram page conflicts based on memory access patterns |
-
2016
- 2016-02-26 US US15/054,295 patent/US9747209B1/en not_active Expired - Fee Related
-
2017
- 2017-01-26 WO PCT/US2017/015187 patent/WO2017146864A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
US9747209B1 (en) | 2017-08-29 |
WO2017146864A1 (en) | 2017-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109074331B (en) | Power reduced memory subsystem with system cache and local resource management | |
US9703493B2 (en) | Single-stage arbiter/scheduler for a memory system comprising a volatile memory and a shared cache | |
JP6859361B2 (en) | Performing memory bandwidth compression using multiple Last Level Cache (LLC) lines in a central processing unit (CPU) -based system | |
TWI773683B (en) | Providing memory bandwidth compression using adaptive compression in central processing unit (cpu)-based systems | |
US20170109090A1 (en) | System and method for page-by-page memory channel interleaving | |
US9367474B2 (en) | Translating cache hints | |
US9043570B2 (en) | System cache with quota-based control | |
US20170108914A1 (en) | System and method for memory channel interleaving using a sliding threshold address | |
US20190361807A1 (en) | Dynamic adjustment of memory channel interleave granularity | |
US9311251B2 (en) | System cache with sticky allocation | |
US20170024145A1 (en) | Address translation and data pre-fetch in a cache memory system | |
US20170108911A1 (en) | System and method for page-by-page memory channel interleaving | |
US9489305B2 (en) | System and method for managing bandwidth and power consumption through data filtering | |
US9747209B1 (en) | System and method for improved memory performance using cache level hashing | |
US10152261B2 (en) | Providing memory bandwidth compression using compression indicator (CI) hint directories in a central processing unit (CPU)-based system | |
US9354812B1 (en) | Dynamic memory utilization in a system on a chip | |
US20180336141A1 (en) | Worst-case memory latency reduction via data cache preloading based on page table entry read data | |
US9251096B2 (en) | Data compression in processor caches | |
US20170178275A1 (en) | Method and system for using solid state device as eviction pad for graphics processing unit | |
CN108885587B (en) | Power reduced memory subsystem with system cache and local resource management | |
US20190073323A1 (en) | Buffering transaction requests to a subsystem via a bus interconnect | |
US20160320972A1 (en) | Adaptive compression-based paging |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, FENG;RYCHLIK, BOHUSLAV;REEL/FRAME:038729/0677 Effective date: 20160519 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN) |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210829 |