US20180286473A1 - Systems and methods for reducing memory power consumption via pre-filled dram values - Google Patents
Systems and methods for reducing memory power consumption via pre-filled dram values Download PDFInfo
- Publication number
- US20180286473A1 US20180286473A1 US15/472,622 US201715472622A US2018286473A1 US 20180286473 A1 US20180286473 A1 US 20180286473A1 US 201715472622 A US201715472622 A US 201715472622A US 2018286473 A1 US2018286473 A1 US 2018286473A1
- Authority
- US
- United States
- Prior art keywords
- fill
- values
- predetermined pattern
- dram
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4091—Sense or sense/refresh amplifiers, or associated sense circuitry, e.g. for coupled bit-line precharging, equalising or isolating
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/4072—Circuits for initialization, powering up or down, clearing memory or presetting
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4093—Input/output [I/O] data interface arrangements, e.g. data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4096—Input/output [I/O] data management or control circuits, e.g. reading or writing circuits, I/O drivers or bit-line switches
Landscapes
- Engineering & Computer Science (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Dram (AREA)
Abstract
Systems and methods are disclosed for reducing memory power consumption via pre-filled dynamic random access memory (DRAM) values. One embodiment is a method for providing DRAM values. A fill request is received from an executing program to fill an allocated portion of the DRAM with a predetermined pattern of values. The predetermined pattern of values is stored in a fill value memory residing in the DRAM. A fill command is sent to the DRAM. In response to the fill command, a plurality of sense amp latches are connected to the fill value memory to update the corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory.
Description
- Portable computing devices (e.g., cellular telephones, smart phones, tablet computers, portable digital assistants (PDAs), portable game consoles, wearable devices, and other battery-powered devices) and other computing devices continue to offer an ever-expanding array of features and services, and provide users with unprecedented levels of access to information, resources, and communications. To keep pace with these service enhancements, such devices have become more powerful and more complex. Portable computing devices now commonly include a system on chip (SoC) comprising a plurality of memory clients embedded on a single substrate (e.g., one or more central processing units (CPUs), a graphics processing unit (GPU), digital signal processors (DSPs), etc.). The memory clients may read data from and store data in an external system memory (i.e., random access memory (RAM)) electrically coupled to the SoC via a high-speed bus.
- Programs running on the processing devices (e.g., software applications, application frameworks, services, etc.) often rely on an initialization value for allocated RAM memory. The programs may request that a portion of RAM is to be filled with a constant value, such as all zeros. The programs may use an explicit assignment or copy and/or write operations. Conventional methods for performing RAM initialization suffer from various disadvantages. The CPU or processor must execute code to perform the RAM initialization, which contributes to CPU power consumption. To perform the initialization, a program may issue multiple write transactions via the SoC bus and the RAM memory controller to fill the RAM with the constant values. This can result in substantial traffic on these paths when the size of the portion to fill is relatively large, which may be encountered when zero initializing a display frame buffer or camera frame buffer. The traffic flowing through the SoC bus and the RAM bus may significantly contribute to memory power consumption. Furthermore, during the fill, other clients (e.g., GPU, DSPs, etc.) must share the RAM bus and may be stalled while the flooding occurs, creating system quality of service (QoS) and stability issues. These problems associated with RAM initialization may be exacerbated as demands for RAM capacity continue to increase.
- Accordingly, there is a need for improved systems and methods for initializing RAM while reducing power consumption and maintaining performance and with minimal involvement of the processor, bus, and interfaces.
- Systems and methods are disclosed for reducing memory power consumption via pre-filled dynamic random access memory (DRAM) values. One embodiment is a method for setting DRAM values. A fill request is received from an executing program to fill an allocated portion of the DRAM with a predetermined pattern of values. The predetermined pattern of values is stored in a fill value memory residing in the DRAM. A fill command is sent to the DRAM. In response to the fill command, a plurality of sense amp latches are connected to the fill value memory to update the corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory.
- Another embodiment is a system comprising a system on chip (SoC) and DRAM. The SoC comprises a processing device and a memory controller. The DRAM is electrically coupled to the memory controller via a bus. The DRAM comprises a fill value memory and a plurality of sense amp latches. The fill value memory is used to pre-fill a predetermined pattern of values for an allocated portion of a memory cell array. The plurality of sense amp latches are electrically coupled to the fill value memory for updating corresponding sense amp latch bits with the predetermined pattern of values in response to a fill command received from the memory controller.
- In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
-
FIG. 1 is a block diagram of an embodiment of a system for reducing memory power consumption via pre-filled DRAM values. -
FIG. 2 is a flowchart illustrating an embodiment of a method for reducing memory power consumption in the system ofFIG. 1 by pre-filling DRAM values. -
FIG. 3 is a circuit diagram illustrating a prior art sense amp latch structure. -
FIG. 4 is a circuit diagram illustrating an embodiment of a sense amp latch structure that may be incorporated in the DRAM ofFIG. 1 for pre-filling DRAM values. -
FIG. 5 is a flow/block diagram illustrating the data structure and operation of an embodiment of a memory fill command for initializing the sense amp latch bits with the values stored in the fill value memory. -
FIG. 6 is a flow/block diagram illustrating the data structure and operation of an embodiment of a DRAM initialization command for initializing the values stored in the fill value memory. -
FIG. 7 is a flow/block diagram illustrating an embodiment of the DDR fill driver for pre-filling DRAM values in the system ofFIG. 1 . -
FIG. 8 is a flowchart illustrating the architecture, operation, and/or functionality of an embodiment of the DDR fill driver inFIG. 7 . -
FIG. 9 is a flow/block diagram illustrating an embodiment of a method implemented in the system ofFIG. 1 . -
FIG. 10 is a block diagram of an embodiment of a portable computing device for incorporating the system ofFIG. 1 . - The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
- In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
- The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
- As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
- In this description, the terms “communication device,” “wireless device,” “wireless telephone”, “wireless communication device,” and “wireless handset” are used interchangeably. With the advent of third generation (“3G”) wireless technology and four generation (“4G”), greater bandwidth availability has enabled more portable computing devices with a greater variety of wireless capabilities. Therefore, a portable computing device may include a cellular telephone, a pager, a PDA, a smartphone, a navigation device, or a hand-held computer with a wireless connection or link.
-
FIG. 1 illustrates an embodiment of asystem 100 for reducing memory power consumption via pre-filled RAM values. It should be appreciated that the pre-filling of RAM values may comprise initialization values or other RAM values or settings. It should be appreciated thatsystem 100 may be implemented in any computing device, including a personal computer, a workstation, a server, a portable computing device (PCD), such as a cellular telephone, a smartphone, a portable digital assistant (PDA), a portable game console, a navigation device, a tablet computer, a wearable device, such as a sports watch, a fitness tracking device, etc., or other battery-powered, web-enabled devices. As illustrated inFIG. 1 , thesystem 100 comprises a system on chip (SoC) 102 electrically coupled to a random access memory (RAM) via a memory bus. It should be appreciated thatsystem 100 may incorporate various types of RAM memory devices. In the embodiment illustrated inFIG. 1 , theSoC 102 is electrically coupled to a dynamic random access memory (DRAM) 104 via a DRAM bus. - The
SoC 102 comprises various on-chip components, including a central processing unit (CPU) 106, a static random access memory (SRAM) 108, read only memory (ROM) 110, aDRAM controller 112, and astorage memory controller 114 electrically coupled viaSoC bus 116. TheCPU 106 may support a high-level operating system (O/S) 130.DRAM controller 112 controls communication withDRAM 104 via a high-speed bus comprising address/control bus 122 anddata bus 124. TheSoC 102 may be electrically coupled toexternal storage memory 118.Storage memory controller 114 controls communication withstorage memory 118. - As known in the art, program(s) 105 executing on
CPU 106 may rely on initialization values or other setting values for allocated RAM memory. Theprograms 105 requesting RAM settings (e.g., initialization) may comprise hardware, software, or firmware applications, application frameworks, services, etc. In an embodiment, aprogram 105 may request that a portion ofDRAM 104 is to be initially filled with a constant value. For example, a display frame buffer, a camera frame buffer, or anyother program 105 may request zero-initialization of allocated DRAM in which the memory is to be filled with all zeros. It should be appreciated, however, that DRAM initialization may involve fillingDRAM 104 with any constant value (0 or 1) or any predetermined pattern of values. - As further illustrated in
FIG. 1 , aphysical layer 120 inDRAM 104 receives memory transactions fromDRAM controller 112.Physical layer 120 is electrically coupled to decode/control logic 126, which is configured to decode the data received via address/control bus 122 anddata bus 124. Decode/control logic 126 is electrically coupled to a plurality of sense amp latches 134 via two control paths. A first control path comprises a global I/O interface 136 to the sense amp latches 136, which is used for performing write transactions. A second control path comprises a fill I/O interface 128 to afill value memory 130, which is electrically coupled to the sense amp latches 134 via a set/clear I/O interface 132. - When performing a write to a
DRAM cell array 140, write transactions result in data being written to the sense amp latches 134 via global I/O interface 136. As known in the art, sense amp latches 134 may store an entire page of data (e.g., 32768 bits). As known in the art, a write transaction comprising, for example, 256 bits, may only update a portion of a page.DRAM cell array 140 may be organized as multiple pages (e.g., 16536). Pages are “opened” by reading an entire page fromDRAM cell array 140 into the sense amp latches 134. Pages are “closed” by storing an entire page from the sense amp latches 134 back into theDRAM cell array 140. In this manner, read or write transactions to update the data in the sense amp latches 134 may only occur when a page is “open”. - It should be appreciated that the second control path (i.e., fill I/
O interface 128 to fillvalue memory 130 and set/clear I/O interface 132 to sense amp latches 134) enablessystem 100 to reduce memory power consumption during DRAM initialization of large or numerous data structures, such as, constants, variables, arrays, strings, etc. When program(s) 105 running onCPU 106 request a portion ofDRAM 104 to be filled with constant values or a predetermined pattern of values, theCPU 106 may first load the initialization values intofill value memory 130 via fill I/O interface 128. In an embodiment, the size of thefill value memory 130 may be equal to the size of a page (e.g., 32768 bits), although the fill values may comprise, for example, a smaller number of bits (e.g., 256 bits) of repeating unique constant values. The pre-filling of thefill value memory 130 with the fill values may involve a single write transaction. The pre-filling write transaction may occur, for example, once at system boot, or may be dynamically performed asprogram 105 is being executed. - When DRAM values are to be initialized, the
system 100 may instruct fillvalue memory 130 and sense amp latches 134 to use the pre-filled initialization values to update a page that currently resides in the sense amp latches 134. As described below in more detail, in an embodiment, a fill command 502 (FIG. 5 ) may be generated and sent by theDRAM controller 112 toDRAM 104. Thefill command 502 may be decoded by decode/control logic 126 into appropriate commands or signals that instruct thefill value memory 130 and sense amp latches 134 to use the content of thefill value memory 130 to update the page currently residing in sense amp latches 134. It should be appreciated that a portion or all of the page may be updated in accordance with the fill values. In an embodiment, thefill command 502 may be used to fill an entire page (e.g., 32768 bits) using asingle fill command 502 and without transferring any data overSoC bus 116 to the DRAM bus (except when initializing the fill values), which may result in faster filling of the page and with reduced power consumption. It should be further appreciated that the 256-bit unique fill value constants can be any value, allowing flexibility in filling a page. -
FIG. 2 illustrates an embodiment of amethod 200 implemented insystem 100 for reducing memory power consumption during DRAM value initialization. Atblock 202, an executingprogram 105 may initiate a fill request to fill an allocated portion ofDRAM 104 with a predetermined pattern of initialization values. The fill request may be received and/or processed by one or more of O/S 130, double data rate (DDR)driver 107, andDRAM controller 112. Atblock 204, the predetermined pattern of initialization values may be pre-filled by storing them infill value memory 130 residing inDRAM 104. The pre-filling may occur at system boot or during program execution. Atblock 206, afill command 502 may be generated and received byDRAM 104. In response to fillcommand 502, atblock 208, a plurality of sense amp latches 134 may be connected to fillvalue memory 130 to update the corresponding sense amp latch bits with the predetermined pattern of initialization values stored infill value memory 130. - As mentioned above, when DRAM values are to be initialized, the
system 100 instructsfill value memory 130 and sense amp latches 134 to use the pre-filled initialization values to update a page that currently resides in the sense amp latches 134.FIG. 3 illustrates a conventional circuit diagram of a sense amp latch for implementing a single bit within a page.FIG. 4 illustrates an improved circuit diagram for providing DRAM initialization as described above. Referring toFIG. 3 , a conventional circuit comprises n-channel field effect transistors (nFETs) 304 and 308, and p-channel field effect transistors (pFETs) 302, 306, 310, and 312, which are electrically connected to bitlines 318 and 320, a positive supply voltage (VDD) 314, and a negative supply voltage (VSS) 316.Bitlines 318 and 320 are electrically coupled to an array I/O interface 138 toDRAM cell array 140. A data I/O interface 324 is electrically coupled to the global I/O interface 136. In operation, an enablesignal 322 is provided totransistors DRAM cell array 140 via array I/O interface 138. Data I/O interface 324 connects the global I/O interface 136 externally to decode/control logic 126 (commonly referred to as peripheral circuitry). Each sense amp latch bit stores a value for one bit in the page. The value may be updated whenever opening a page or writing to the bit during a write transaction. The value can be read at any time. -
FIG. 4 shows a modification to the conventional circuit diagram to support the DRAM initialization provided bysystem 100. The exemplary modified circuit diagram further comprises apFET 402 and annFET 404 that may be driven by a set/clear_not signal 406 provided on set/clr I/O interface 132 to allow thefill value memory 130 to update the value of the sense amp latch bit. In response to the set/clear_not signal 406, the sense amp latch bit is electrically coupled to the corresponding value stored in thefill value memory 130. In this manner, each of the sense amp latch bits within a page (e.g., 32,768 bits) may be simultaneously updated. It should be appreciated that the exemplary circuit diagram ofFIG. 4 is merely one possible implementation. Other circuit diagrams may be implemented, and the specific combination and quantity of field effect transistor (FET) types and/or bipolar transistors is not restricted to this example. Furthermore, in other embodiments, an input multiplexer may be used, such as on the data I/O interface 324 (not shown), where a 2-input multiplexer may replacetransistors clear_not signal 406 to drive either a 1 or 0 onto the data I/O interface 324. -
FIG. 5 is a flow/block diagram illustrating the data structure and operation of an embodiment of amemory fill command 502 for updating the sense amp latch bits with the values stored in thefill value memory 130. When a fill is initiated, thefill command 502 may be generated and sent byDRAM controller 112. In the embodiment ofFIG. 5 ,memory fill command 502 may include aphysical bank address 504, astart group 506, and anend group 508. As known in the art,DRAM 104 may comprise one or more DRAM chips with each chip having a plurality of banks (e.g., 8 banks per DRAM chip). Thephysical bank address 504 may direct the fill operation to one of the physical banks. Each page may comprise 32,768 bits.Start group 506 andend group 508 may specify a contiguous portion of the page that will be filled using the fill values contained in thefill value memory 130. As illustrated inFIG. 5 , the page may be organized into a plurality of groups (e.g., 128 groups where each group comprises 32 bytes). In one exemplary implementation, ifstart group 506 specifiesgroup 0 andend group 508 specifiesgroup 127, then the entire page will be filled fromfill value memory 130. In other example, ifstart group 506 specifiesgroup 126 andend group 508 specifiesgroup 127, then only the upper 512 bits (64 bytes) of the page will be filled, and the lower 32256 bits (4032 bytes) of the page will remain untouched. In this regard,arrow 510 demonstrates that thefill command 502 controls which groups are to be filled.Arrow 512 illustrates the data transfer from the sense amp latches 134 toDRAM cell array 140. When “closing” a page, all sense amp latch data is transferred back toDRAM cell array 140. When “opening” a page, the opposite occurs (i.e., the sense amp latch data is filled with the content from DRAM cell array 140). -
FIG. 6 is a flow/block diagram illustrating the data structure and operation of an embodiment of aDRAM initialization command 602 for initializing the fill values stored in thefill value memory 130. Prior to thefill command 502 being used, thefill value memory 130 may be initialized. TheDRAM controller 112 may configure and sendDRAM initialization command 602 toDRAM 104. In an embodiment,DRAM initialization command 602 comprises aphysical bank address 604 and afill value 606. It should be appreciated that the 32-byte fill value 606 may comprise any unique value. Following the above example fromFIG. 5 , as illustrated byreference numeral 607, thefill value 606 may comprise a 256-bit or 32-byte value used to initialize each of the 128 32-byte groups in the 4K bytefill value memory 130 specified byphysical bank address 604 to the same value. For example,bit 0 of the 32-byte fill value may be written tobit 0 ofgroup 0, to bit 0 ofgroup 1, to bit 0 ofgroup 2, and so forth. It should be appreciated that, in other embodiments, instead of having a unique fill value and fillvalue memory 130 for each bank, there can be a singlefill value memory 130 that serves all banks in the same memory chip. It should be appreciated that, in other embodiments, instead of repeating the 32-byte fill value across each of the 128 groups, there can be an additional parameter inDRAM initialization command 602, which selectively applies the 32-byte fill value to a subset of the 128 groups, allowing greater flexibility when filling thefill value memory 130. In further embodiments, the choice of 32-bytes as the unit for transacting fill values and groups can be adjusted larger or smaller in size where necessary. -
FIG. 7 illustrates an exemplary control flow insystem 100 for implementing DRAM initialization. TheDDR fill driver 107 may provide a software interface between the executing program(s) 105 andDRAM controller 112.Program 105 may initiate DRAM initialization by making a call (reference numeral 702) toDDR fill driver 1000.DDR fill driver 107 may translate the program's fill request into parameters that will instruct the DRAM controller 112 (reference numeral 704) to execute thefill command 502 on DRAM 104 (reference numeral 706). -
FIG. 8 is a flowchart illustrating the architecture, operation, and/or functionality of an embodiment of theDDR fill driver 107. Atblock 802, theDDR fill driver 107 may receive a fill request from a program executed by theCPU 106 onSoC 102. Atdecision block 804, theDDR fill driver 107 may check to determine if the fill request is for contiguous addresses. If “no”, atblock 806, theDDR fill driver 107 may break the fill request into multiple pages. If “yes”, atblock 808, theDDR fill driver 107 may perform group alignment, resize as needed, and generate a plurality of fill transactions. Atblock 810,DDR fill driver 107 may convert virtual addresses to physical addresses for each fill transaction. Atblock 812, theDDR fill driver 107 may queueDRAM controller 112 for the fill transactions. The queuing operation may include the opening/closing of new pages (e.g., if the original request spans multiple pages). -
FIG. 9 is a flow/block diagram illustrating another embodiment of amethod 900 implemented in thesystem 100 for reducing memory power consumption by pre-filling DRAM values. Themethod 900 illustrates the flow between program(s) 105 initiating a fill request, O/S 130,DDR fill driver 107,DRAM controller 112, adevice interface 901, andcell array 140. It should be appreciated that, for purposes of simplifying the flow diagram, thedevice interface 901 may comprise a combination of one or more of the following:PHY 120, decode andcontrol module 126, fillvalue memory 130, and sense amp latches 134. The initial conditions may assume that an activation has previously occurred and that the sense amp latches 134 currently hold a valid page to be subsequently filled as described herein. Atreference numeral 902,program 105 may send a request to the O/S 130 to allocate a portion of DRAM memory to be initialized with a predetermined pattern of values. Atreference numeral 904, O/S 130 may respond by allocating memory with a requested size, and initialize the memory by using O/S function(s), such as, for example, a calloc( ) or memset( ) function. Atreference numeral 904, O/S 130 may initiate a call toDDR fill driver 107. The call may comprise a memFill( ) function, which specifies a base physical address and a size. As mentioned above in connection withFIG. 8 , in response to the call, theDDR fill driver 107 may check for contiguous addresses (reference numeral 908), break down the request into multiple pages (reference numeral 910), perform group alignment and resizing (reference numeral 912), and perform virtual-to-physical address translation (reference numeral 914). As illustrated inFIG. 9 , theDDR fill driver 107 may initiate acontrol loop 915 which may be repeated for each row until the whole memory buffer is filled with the predetermined pattern. - At
reference numeral 916, theDDR fill driver 107 may write to control registers. Atreference numeral 918,DRAM controller 112 may convert addresses to a bank number and a row number. Atreference numeral 920,DRAM controller 112 may initiate theDRAM initialization command 602 to device interface xxx. Atreference numeral 922, thefill value memory 130 may be pre-filled with theunique fill value 606. Atreference numeral 924,DRAM controller 112 may initiate afill command 502 to device interface xxx. Atreference numeral 926, the page may be written tomemory cell array 140 with the pre-filled values. - As mentioned above, the
fill command 502 may specify that only a portion or all of page is to be updated in accordance with pre-filled values stored infill value memory 130. In this regard, it should be appreciated that, in another embodiment, afirst control loop 915 may be used to fill a first portion of allocated DRAM, while one or morefurther control loops 915 may be used to fill additional portions. In one example, afirst control loop 915 fills a first portion with a first pattern of values, and a second control fills a second portion with a second pattern of values. - One of ordinary skill in the art will appreciate that the systems and methods described above may be leveraged for use in performing an enhanced page copy operation. O/
S 130 may employ a copy-on-write (COW) mechanism whereby multiple processes can reference a 4 KB page as read-only up until the time that they need to modify the content of the page. When the need for modification arises, O/S 130 may create a copy of the original page and permit write access to the process. At this point, there will be two pages: (1) the original page which has write access from the original process; and (2) the copied page which has write access from the new process. COW is desirable in modern system because it is economical on memory space (i.e., copies are only created when a process needs to modify the original reference). The traditional method for copying one page to another is memcpy( ), which simply copies N bytes from source address to destination address. This may consume more energy than a traditional memory fill because this operation may use both reads and writes and is generally done with N=64 or 128 bytes at a time (i.e, read, write, read, write, etc.). For a full page COW, this amounts to 4 KB of copy. As described above, the convention method involves the CPU, the interconnect, and the DDR bus, which all contribute to power consumption. - Using the above-described methods and systems, there may be page copies of certain types of COW pages (e.g., zero-filled arrays, image and surface buffers, etc.), where during COW, a modified memcpy( ) may read M=4096 bytes from the source address while leveraging the
DDR fill driver 107 and the sense amp latch structure to complete the fill (copy). O/S 130 may already be aware of the 4K page contents. For example, there may be a Z flag set, which indicates all zeros. Alternatively, during the read phase,DRAM controller 112 may detect if the copy operation can benefit from pre-filling. IN this regard, theDRAM controller 112 may support hardware monitoring, which performs in-flight comparison of the read data, looks for zero and/or repeating values, and provides software readable status on the outcome. For example, the hardware checks if all or a portion of the 4K bytes of read data are equal to the first byte. Another implementation may only check for all zero values. Further checking may determine if a meaningful portion (⅛, ¼, etc. but not 1/32) of the 4K byte page have identical values. In other embodiments, instead of read, write, read, write, a modified memcpy( ) using this technique may assign a portion of internal cache RAM orother SRAM 108 to use as a temporary buffer to read the entire 4K bytes of read data. Upon completion of the 4K byte read, the status of theDRAM controller 112 hardware status may be checked, and the above-described methods and systems may be initiated to perform the 4K byte write if the opportunity to save energy presents itself (e.g., if the 4K bytes read were all zero values or all repeating values). If the status of theDRAM controller 112 hardware does not indicate an opportunity of zero or repeating values, then conventional 4K byte writes may complete the memcpy( ). - Memory accesses from various clients may be interleaved and discontiguous, so the hardware may be aware of the physical addresses when calculating the status. The hardware can also inspect a standard bus transaction master ID to whitelist clients (e.g., ignore all clients except for the CPU). The COW may use this information to determine whether the pre-filling will benefit the modified memcpy( ). If it can (e.g., the copied 4K are all zero), then pre-filling DRAM values may improve energy efficiency of the overall COW □operation. If it cannot (e.g., the copied 4K are random), then a conventional memcpy( ) may be employed.
- As mentioned above, the
system 100 may be incorporated into any desirable computing system.FIG. 10 illustrates thesystem 100 incorporated in an exemplary portable computing device (PCD) 1000. It will be readily appreciated that certain components of thesystem 100 may be included on the SoC 322 (e.g.,DDR fill driver 107, DRAM controller 112) while other components (e.g., DRAM 104) may be external components coupled to theSoC 322. TheSoC 322 may include amulticore CPU 1002. Themulticore CPU 1002 may include azeroth core 1010, afirst core 1012, and anNth core 1014. One of the cores may comprise, for example, a graphics processing unit (GPU) with one or more of the others comprising the CPU. - A
display controller 328 and atouch screen controller 330 may be coupled to theCPU 1002. In turn, thetouch screen display 1006 external to the on-chip system 322 may be coupled to thedisplay controller 328 and thetouch screen controller 330. -
FIG. 10 further shows that avideo encoder 334, e.g., a phase alternating line (PAL) encoder, a sequential color a memoire (SECAM) encoder, or a national television system(s) committee (NTSC) encoder, is coupled to themulticore CPU 1002. Further, avideo amplifier 336 is coupled to thevideo encoder 334 and thetouch screen display 1006. Also, avideo port 338 is coupled to thevideo amplifier 336. As shown inFIG. 10 , a universal serial bus (USB)controller 340 is coupled to themulticore CPU 1002. Also, aUSB port 342 is coupled to theUSB controller 340. - Further, as shown in
FIG. 10 , adigital camera 348 may be coupled to themulticore CPU 1002. In an exemplary aspect, thedigital camera 348 is a charge-coupled device (CCD) camera or a complementary metal-oxide semiconductor (CMOS) camera. - As further illustrated in
FIG. 10 , a stereo audio coder-decoder (CODEC) 350 may be coupled to themulticore CPU 1002. Moreover, anaudio amplifier 352 may coupled to thestereo audio CODEC 350. In an exemplary aspect, afirst stereo speaker 354 and asecond stereo speaker 356 are coupled to theaudio amplifier 352.FIG. 10 shows that amicrophone amplifier 358 may be also coupled to thestereo audio CODEC 350. Additionally, amicrophone 360 may be coupled to themicrophone amplifier 358. In a particular aspect, a frequency modulation (FM)radio tuner 362 may be coupled to thestereo audio CODEC 350. Also, anFM antenna 364 is coupled to theFM radio tuner 362. Further,stereo headphones 366 may be coupled to thestereo audio CODEC 350. -
FIG. 10 further illustrates that a radio frequency (RF)transceiver 368 may be coupled to themulticore CPU 1002. AnRF switch 370 may be coupled to theRF transceiver 368 and anRF antenna 372. Akeypad 204 may be coupled to themulticore CPU 1002. Also, a mono headset with amicrophone 376 may be coupled to themulticore CPU 702. Further, avibrator device 378 may be coupled to themulticore CPU 1002. -
FIG. 10 also shows that apower supply 380 may be coupled to the on-chip system 322. In a particular aspect, thepower supply 380 is a direct current (DC) power supply that provides power to the various components of thePCD 1000 that require power. Further, in a particular aspect, the power supply is a rechargeable DC battery or a DC power supply that is derived from an alternating current (AC) to DC transformer that is connected to an AC power source. -
FIG. 10 further indicates that thePCD 1000 may also include anetwork card 388 that may be used to access a data network, e.g., a local area network, a personal area network, or any other network. Thenetwork card 388 may be a Bluetooth network card, a WiFi network card, a personal area network (PAN) card, a personal area network ultra-low-power technology (PeANUT) network card, a television/cable/satellite tuner, or any other network card well known in the art. Further, thenetwork card 388 may be incorporated into a chip, i.e., thenetwork card 388 may be a full solution in a chip, and may not be aseparate network card 388. - As depicted in
FIG. 10 , thetouch screen display 606, thevideo port 338, theUSB port 342, thecamera 348, thefirst stereo speaker 354, thesecond stereo speaker 356, themicrophone 360, theFM antenna 364, thestereo headphones 366, theRF switch 370, theRF antenna 372, the keypad 374, themono headset 376, thevibrator 378, and thepower supply 380 may be external to the on-chip system 322. - It should be appreciated that one or more of the method steps described herein may be stored in the memory as computer program instructions, such as the modules described above. These instructions may be executed by any suitable processor in combination or in concert with the corresponding module to perform the methods described herein.
- Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
- Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example.
- Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the Figures which may illustrate various process flows.
- In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, NAND flash, NOR flash, M-RAM, P-RAM, R-RAM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
- Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
- Disk and disc, as used herein, includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
Claims (30)
1. A method for setting values of a dynamic random access memory, the method comprising:
receiving from an executing program a fill request to fill an allocated portion of a dynamic random access memory with a predetermined pattern of values;
storing the predetermined pattern of values in a fill value memory residing in the DRAM;
sending a fill command to the DRAM; and
in response to the fill command, connecting a plurality of sense amp latches to the fill value memory to update the corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory.
2. The method of claim 1 , wherein the storing the predetermined pattern of values in the fill value memory comprises a write transaction to the DRAM.
3. The method of claim 2 , wherein the write transaction occurs during the executing program.
4. The method of claim 1 , wherein the fill command comprises a physical bank address identifying one of a plurality of DRAM banks to be filled using the predetermined pattern of values stored in the fill value memory.
5. The method of claim 4 , wherein the fill command further comprises a start group and an end group defining a contiguous portion of a page to be filled using the predetermined pattern of values stored in the fill value memory.
6. The method of claim 1 , wherein the predetermined pattern of values comprises a programmable constant value.
7. The method of claim 1 , wherein the connecting the plurality of sense amp latches to the fill value memory comprises sending a set/clear_not signal to the plurality of sense amp latches.
8. The method of claim 1 , wherein the plurality of sense amp latches corresponds to a page.
9. The method of claim 8 , further comprising: writing the page to a memory cell array.
10. A system for setting values of a dynamic random access memory, the system comprising:
means for receiving from an executing program a fill request to fill an allocated portion of a dynamic random access memory (DRAM) with a predetermined pattern of values;
means for storing the predetermined pattern of values in a fill value memory residing in the DRAM;
means for sending a fill command to the DRAM; and
means for connecting, in response to the fill command, a plurality of sense amp latches to the fill value memory to update the corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory.
11. The system of claim 10 , wherein the means for storing the predetermined pattern of values in the fill value memory comprises:
means for performing a write transaction to the DRAM.
12. The system of claim 11 , wherein the write transaction occurs during the executing program.
13. The system of claim 10 , wherein the fill command comprises a physical bank address identifying one of a plurality of DRAM banks to be filled using the predetermined pattern of values stored in the fill value memory.
14. The system of claim 13 , wherein the fill command further comprises a start group and an end group defining a contiguous portion of a page to be filled using the predetermined pattern of values stored in the fill value memory.
15. The system of claim 10 , wherein the predetermined pattern of values comprises a constant value.
16. The system of claim 10 , wherein the means for connecting the plurality of sense amp latches to the fill value memory comprises:
means for sending a set/clear_not signal to the plurality of sense amp latches.
17. The system of claim 10 , wherein the plurality of sense amp latches corresponds to a page.
18. The system of claim 17 , further comprising:
means for writing the page to a memory cell array.
19. A system for setting values of a dynamic random access memory, the system comprising:
a system on chip (SoC) comprising a processing device and a memory controller; and
a dynamic random access memory (DRAM) electrically coupled to the memory controller via a bus, the DRAM comprising:
a fill value memory for pre-filling a predetermined pattern of values for an allocated portion of a memory cell array; and
a plurality of sense amp latches electrically coupled to the fill value memory for updating corresponding sense amp latch bits with the predetermined pattern of values in response to a fill command received from the memory controller.
20. The system of claim 19 , wherein the fill value memory is pre-filled with the predetermined pattern of values via a write transaction initiated by the memory controller.
21. The system of claim 20 , wherein the write transaction occurs during execution of a program by the processing device that initiates a fill request to fill an allocated portion of the DRAM with the predetermined pattern of values.
22. The system of claim 19 , wherein the fill command received from the memory controller comprises a physical bank address identifying one of a plurality of DRAM banks to be filled using the predetermined pattern of values stored in the fill value memory.
23. The system of claim 22 , wherein the fill command further comprises a start group and an end group defining a contiguous portion of a page to be filled using the predetermined pattern of values stored in the fill value memory.
24. The system of claim 19 , wherein the predetermined pattern of values comprises a constant value.
25. The system of claim 19 , wherein the plurality of sense amp latch bits are updated with the predetermined pattern of values by sending a set/clear_not signal to the plurality of sense amp latches.
26. A computer program embodied in a computer-readable medium and executable by a processing device for setting values of a dynamic random access memory, the computer program comprising logic configured to:
receive from an executing program a fill request to fill an allocated portion of a dynamic random access memory (DRAM) with a predetermined pattern of values;
store the predetermined pattern of values in a fill value memory residing in the DRAM; and
instruct a plurality of sense amp latches to update corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory.
27. The computer program of claim 26 , wherein the fill command comprises a physical bank address identifying one of a plurality of DRAM banks to be filled using the predetermined pattern of values stored in the fill value memory.
28. The computer program of claim 27 , wherein the fill command further comprises a start group and an end group defining a contiguous portion of a page to be filled using the predetermined pattern of values stored in the fill value memory.
29. The computer program of claim 26 , wherein the predetermined pattern of values comprises a constant value.
30. The computer program of claim 26 , wherein the logic configured to instruct the plurality of sense amp latches to update the corresponding sense amp latch bits with the predetermined pattern of values stored in the fill value memory comprises logic configured to: send a set/clear_not signal to the plurality of sense amp latches.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/472,622 US10090040B1 (en) | 2017-03-29 | 2017-03-29 | Systems and methods for reducing memory power consumption via pre-filled DRAM values |
PCT/US2018/016519 WO2018182848A1 (en) | 2017-03-29 | 2018-02-01 | Systems and methods for reducing memory power consumption via pre-filled dram values |
TW107106770A TW201903768A (en) | 2017-03-29 | 2018-03-01 | Systems and methods for reducing memory power consumption via pre-filled dram values |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/472,622 US10090040B1 (en) | 2017-03-29 | 2017-03-29 | Systems and methods for reducing memory power consumption via pre-filled DRAM values |
Publications (2)
Publication Number | Publication Date |
---|---|
US10090040B1 US10090040B1 (en) | 2018-10-02 |
US20180286473A1 true US20180286473A1 (en) | 2018-10-04 |
Family
ID=61224594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/472,622 Active 2037-03-30 US10090040B1 (en) | 2017-03-29 | 2017-03-29 | Systems and methods for reducing memory power consumption via pre-filled DRAM values |
Country Status (3)
Country | Link |
---|---|
US (1) | US10090040B1 (en) |
TW (1) | TW201903768A (en) |
WO (1) | WO2018182848A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5896331A (en) * | 1997-12-23 | 1999-04-20 | Lsi Logic Corporation | Reprogrammable addressing process for embedded DRAM |
US6393587B2 (en) * | 1997-11-17 | 2002-05-21 | I-Tech Corporation | Deep trace memory system for a protocol analyzer |
US9110809B2 (en) * | 2013-07-03 | 2015-08-18 | Nvidia Corporation | Reducing memory traffic in DRAM ECC mode |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0446847B1 (en) | 1990-03-12 | 1998-06-17 | Nec Corporation | Semiconductor memory device having improved write function |
US6785190B1 (en) | 2003-05-20 | 2004-08-31 | Intel Corporation | Method for opening pages of memory with a single command |
US7464243B2 (en) | 2004-12-21 | 2008-12-09 | Cisco Technology, Inc. | Method and apparatus for arbitrarily initializing a portion of memory |
US9092358B2 (en) | 2011-03-03 | 2015-07-28 | Qualcomm Incorporated | Memory management unit with pre-filling capability |
US9805802B2 (en) | 2015-09-14 | 2017-10-31 | Samsung Electronics Co., Ltd. | Memory device, memory module, and memory system |
-
2017
- 2017-03-29 US US15/472,622 patent/US10090040B1/en active Active
-
2018
- 2018-02-01 WO PCT/US2018/016519 patent/WO2018182848A1/en active Application Filing
- 2018-03-01 TW TW107106770A patent/TW201903768A/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6393587B2 (en) * | 1997-11-17 | 2002-05-21 | I-Tech Corporation | Deep trace memory system for a protocol analyzer |
US5896331A (en) * | 1997-12-23 | 1999-04-20 | Lsi Logic Corporation | Reprogrammable addressing process for embedded DRAM |
US9110809B2 (en) * | 2013-07-03 | 2015-08-18 | Nvidia Corporation | Reducing memory traffic in DRAM ECC mode |
Also Published As
Publication number | Publication date |
---|---|
TW201903768A (en) | 2019-01-16 |
WO2018182848A1 (en) | 2018-10-04 |
US10090040B1 (en) | 2018-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11385795B2 (en) | Method and apparatus to enable individual non volatile memory express (NVMe) input/output (IO) queues on differing network addresses of an NVMe controller | |
TWI587209B (en) | Method,multiprocessor computing device,and nontransitory computer readable storage medium for dynamic address negotiation for shared memory regions in heterogeneous multiprocessor systems | |
US9921961B2 (en) | Multi-level memory management | |
US11526440B2 (en) | Providing multiple memory modes for a processor including internal memory | |
US10725957B1 (en) | Uniform memory access architecture | |
US10943635B2 (en) | Memory device shared by two or more processors and system including the same | |
US10157008B2 (en) | Systems and methods for optimizing memory power consumption in a heterogeneous system memory | |
US9747038B2 (en) | Systems and methods for a hybrid parallel-serial memory access | |
US20210133125A1 (en) | Storage device including reconfigurable logic and method of operating the storage device | |
JP2017528813A (en) | System and method for expanding memory for system on chip | |
US9575881B2 (en) | Systems and methods for providing improved latency in a non-uniform memory architecture | |
US20130124800A1 (en) | Apparatus and method for reducing processor latency | |
US10090040B1 (en) | Systems and methods for reducing memory power consumption via pre-filled DRAM values | |
EP3227787B1 (en) | Systems and methods for providing improved latency in a non-uniform memory architecture | |
US10579516B2 (en) | Systems and methods for providing power-efficient file system operation to a non-volatile block memory | |
TW202111543A (en) | Serial memory device i/o mode selection | |
TW202336597A (en) | Memory control circuit, memory storage device and memory control circuit unit | |
CN113093992A (en) | Method and system for decompressing commands and solid state disk | |
US20070143573A1 (en) | Data processing apparatus and method using translation table emulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHUN, DEXTER;LI, YANRU;SIGNING DATES FROM 20170525 TO 20170605;REEL/FRAME:042820/0432 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |