CN116486887A

CN116486887A - Aliased row hammer detector

Info

Publication number: CN116486887A
Application number: CN202310050896.3A
Authority: CN
Inventors: E·吉斯克; C·德里克; R·M·沃克; S·艾亚普利迪; N·伊佐; M·盖格; 陆洋; A·艾卡尔; E·C·科珀·巴利斯; D·卡拉乔
Original assignee: Micron Technology Inc
Current assignee: Micron Technology Inc
Priority date: 2022-01-22
Filing date: 2023-01-20
Publication date: 2023-07-25

Abstract

The present disclosure relates to an aiaised row hammer detector. Energy efficient and area-saving mitigation of errors in memory media devices caused by row hammer attacks and the like is described. In SRAM, error detection is performed deterministically while maintaining a number of row access counters that is less than the total number of protected rows in the memory media device. The reduction in the number of counters required is achieved by aliasing a number of protected rows into each counter. Mitigation may be implemented on a per group, per channel, or per memory media device basis. The memory media device may be a DRAM.

Description

Aliased row hammer detector

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No. 63/302,051 filed on month 22 of 2022, the contents of which are hereby incorporated by reference. In addition, the present application relates to the following commonly assigned U.S. patent applications: attorney docket No. 2021139975-US-3 entitled "memory media line activation bias cache (Memory Media Row Activation-Biased cache)"; attorney docket 2021140001-US-2 entitled "RHR interrupt of operating System (RHR Interrupts to the Operating System)"; attorney docket No. 2021140197-US-2 entitled "practical space-saving row hammer detector (Practical Space Saving Row Hammer Detector)"; attorney docket 2021140206-US-2 entitled "area optimization RHR solution for CXL controller (Area Optimized RHR Solution for the CXL Controller)"; attorney docket 2021140514-US-2 entitled "optimal control of commands running in cache (Optimized Control of Commands Running in a Cache)"; attorney docket 2021140514-US-3 entitled "control backpressure based on total number of buffered read and write entries (Control of the Back Pressure Based on a Total Number of Buffered Read and Write Entries)"; and attorney docket number 2021140514-US-4, entitled "arbitration policy (Arbitration Policy to Prioritize Read Command Dequeing by Delaying Write Command Dequeing) to prioritize read command dequeuing by delayed write command dequeuing", the contents of each of which are hereby incorporated by reference.

Technical Field

The present disclosure relates to deterministic detection of row hammer errors in memory media.

Background

Memory devices (also referred to as "memory media devices") are widely used to store information in various electronic devices, such as computers, user devices, wireless communication devices, cameras, digital displays, and the like. Information is stored by programming memory cells within a memory device to various states. For example, a binary memory cell may be programmed to one of two support states, typically corresponding to a logic 1 or a logic 0. In some examples, a single memory cell may support more than two possible states, any of which may be stored by the memory cell. To access information stored by the memory device, the component may read or sense the state of one or more memory cells within the memory device. To store information, a component may write or program one or more memory cells within a memory device to a corresponding state.

There are various types of memory devices including magnetic hard disks, random Access Memory (RAM), read Only Memory (ROM), dynamic RAM (DRAM), synchronous Dynamic RAM (SDRAM), static RAM (SRAM), flash memory, and others. The memory device may be volatile or nonvolatile. Volatile memory cells (e.g., DRAM cells) may lose their programmed state over time unless they are periodically refreshed by an external power supply. SRAM memories may maintain their programmed states for the duration of system power-up. Nonvolatile memory cells (e.g., NAND memory cells) can maintain their programmed state for a long period of time even in the absence of an external power source.

The memory device may be coupled to a host (e.g., a host computing device) to store data, commands, and/or instructions for use by the host when the computer or other electronic system is operating. For example, data, commands, and/or instructions may be transferred between a host and a memory device during operation of a computing or other electronic system. A controller, referred to as a "memory controller," may be used to manage the transfer of data, commands, and/or instructions between a host and a memory device.

Disclosure of Invention

In one aspect, the present disclosure relates to an apparatus comprising: a memory media device interface configured to connect to a memory media device; at least one memory comprising a plurality of counters, wherein a total number of counters of the plurality of counters is less than a total number of rows in the memory media device that are available for data storage, and each counter of the plurality of counters is associated with a respective group of rows in the memory media device that are available for data storage; and a first circuit configured to perform operations comprising: reading a counter of the plurality of counters based on the row identifier of the memory media access request; incrementing a count value of a counter; triggering a response to an error in the memory media device if the incremented count value exceeds a configured trigger threshold; and writing the incremented count value to the counter.

In another aspect, the present disclosure is directed to a method comprising: configuring a plurality of counters in at least one memory, wherein a total number of counters of the plurality of counters is less than a total number of rows in the memory media device that are available for data storage, and each counter of the plurality of counters is associated with a respective group of rows in the memory media device that are available for data storage; reading, by the first circuit, a counter of the plurality of counters based on the row identifier of the memory medium access request; incrementing a count value of a counter by a first circuit; triggering, by the first circuit, a response to an error in the memory media device if the incremented count value exceeds a configured trigger threshold; and writing, by the first circuit, the incremented count value to the counter.

In yet another aspect, the present disclosure is directed to a system comprising: a host system; a memory media device; and a memory controller, comprising: a first interface configured to connect to a host system; a second interface configured to connect to a memory media device; at least one memory comprising a plurality of counters, wherein a total number of counters of the plurality of counters is less than a total number of rows in the memory media device that are available for data storage, and each counter of the plurality of counters is associated with a respective group of rows in the memory media device that are available for data storage; and circuitry configured to perform operations comprising: reading a counter of the plurality of counters based on the row identifier of the memory media access request; incrementing a count value of a counter; triggering a response to an error in the memory media device if the incremented count value exceeds a configured trigger threshold; and writing the incremented count value to the counter.

Drawings

1A-1C illustrate example functional block diagrams in the form of a computing system including a memory controller configured to detect row hammer attacks using an aliased row activation counter, according to some example embodiments of the present disclosure.

FIG. 2A shows a schematic diagram of a memory bank in a DRAM media device.

Fig. 2B shows a flowchart depicting a basic implementation flow of row hammer mitigation.

Figure 3 graphically illustrates example distributions of row hammer events at the global level, at the channel level, and at the group level in the memory controller.

Fig. 4 illustrates a logical block diagram of a row hammer mitigation component implemented at each group level, in accordance with some embodiments.

Fig. 5 schematically illustrates circuitry for row hammer mitigation using an aliased row activation counter used in the devices shown in fig. 1A-1C, for example, in accordance with some embodiments.

Fig. 6A schematically illustrates an aliased row activation counter organized in an SRAM used in an implementation such as that shown in fig. 5, according to some embodiments.

Fig. 6B illustrates a table showing the effect of an aliasing factor on the aliasing line activation counter table size according to some example embodiments.

Fig. 7 illustrates a flow chart showing a process for row hammer mitigation using an aliased row activation counter update and comparison in a circuit such as that shown in fig. 5, according to some example embodiments.

Fig. 8A-8C illustrate ping-pong implementations for initializing an aliased row activation counter in a row hammer mitigation implementation, such as that shown in fig. 5, according to some example embodiments.

Fig. 9 illustrates another implementation for initializing an aliased row activation counter in a row hammer mitigation implementation, such as that shown in fig. 5, according to some example embodiments.

Fig. 10 illustrates a table of different row hammer attack responses that may be implemented in accordance with some example embodiments.

Detailed Description

The present disclosure describes systems, apparatus, and methods related to a detector for memory media soft errors, such as row hammer errors. The detector, sometimes referred to herein as a row hammer detector, is configured to perform deterministic detection of row hammer attacks in a DRAM medium in an energy efficient and space-saving manner.

In the following detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how one or more embodiments of the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments of this disclosure, and it is to be understood that other embodiments may be utilized and that process, electrical, and structural changes may be made without departing from the scope of the present disclosure.

In some embodiments, the row hammer detector is located in a "memory controller". The memory controller may coordinate execution of operations to write data into at least one of the plurality of types of memory devices.

Fig. 1A illustrates an example functional block diagram in the form of a computing system 101 including a memory controller 100 configured to detect row hammer attacks, according to some embodiments of the present disclosure. Computing system 101 may detect and mitigate row hammer attacks on one or more memory devices 126. The memory controller 100 includes a front end section 104, a central controller section 110, a back end section 119, and a management unit 135. The memory controller 100 may be coupled to a host 103 (i.e., host system 103) and a memory device 126. In some embodiments, memory device 126 may be a DRAM device.

Front end portion 104 includes an interface 106 for coupling memory controller 100 to host 103 through one or more input/output (I/O) lanes 102. Communication over I/O lane 102 may be in accordance with a protocol such as peripheral component interconnect express (PCIe). In some embodiments, the plurality of I/O lanes 102 may be configured as a single port. Example embodiments are not limited by the number of I/O lanes, whether the I/O lanes belong to a single port, or the communication protocol with which the host communicates.

Interface 106 receives data and/or commands from host 103 through I/O lane 102. In an embodiment, interface 106 is a Physical (PHY) interface configured for PCIe communications. The front-end portion 104 may include interface management circuitry 108 (including data links and transaction controls) that may provide high-level protocol support for communication with the host 103 over the PHY interface 106.

The central controller portion 110 is configured to control the execution of memory operations in response to receiving a request or command from the host 103. The memory operation may be a memory operation that reads data from the memory device 126 or writes data to the memory device 126. The central controller portion 110 may include a cache memory 112 that stores data associated with performing memory operations, a security component 114 configured to encrypt data prior to storage and decrypt data after reading, the data being in a memory device 126.

In some embodiments, in response to receiving a request from the host 103, data from the host 103 may be stored in a cache line of the cache memory 112. Data in the cache memory may be written to the memory device 126. Error correction component 116 is configured to provide error correction for data read from and/or written to memory device 126. In some embodiments, the data may be encrypted using an encryption protocol, such as Advanced Encryption Standard (AES) encryption, before the data is stored in the cache memory. In some embodiments, central controller portion 110 may control writing of multiple pages of data to memory device 126 substantially simultaneously in response to receiving a request from host 103.

The management unit 135 is configured to control the operation of the memory controller 100. The management unit may recognize commands from the host 103 and manage the one or more memory devices 126 accordingly. In some embodiments, the management unit 135 includes an I/O bus 138 that manages out-of-band data, a management unit controller 140 that performs functions including, but not limited to, firmware that monitors and configures characteristics of the memory controller 100, and a management unit memory 142 that stores data associated with the memory controller 100 functions. The management unit controller 140 may also execute instructions associated with initializing and configuring the characteristics of the memory controller 100. The endpoints of the management unit 135 may be exposed to the host system 103 to manage data via the communication channel using the I/O bus 138.

A second endpoint of the management unit 135 may be exposed to the host system 103 to manage data over the communication channel using the interface 106. In some embodiments, the characteristics monitored by the management unit 135 may include the voltage supplied to the memory controller 100 or the temperature measured by an external sensor, or both. Further, the management unit 135 may include a local bus interconnect 136 that couples the different components of the memory controller 100. In some embodiments, the local bus interconnect 136 may include, but is not limited to, an advanced high performance bus (AHB).

The management unit 135 may include a management unit controller 140. In some embodiments, the management unit controller 140 may be a controller that conforms to the Joint Test Action Group (JTAG) standard and operates according to the inter-integrated circuit (I2C) protocol, and auxiliary I/O circuitry. As used herein, the term "JTAG" refers generally to an industry standard for verifying design and testing printed circuit boards after manufacture. As used herein, the term "I2C" generally refers to a serial protocol for a two-wire interface that connects low-speed devices such as microcontrollers, I/O interfaces, and other similar peripherals in embedded systems.

The back-end portion 119 is configured to be coupled to one or more types of memory devices (e.g., DRAM media 126) via (e.g., through) a plurality of channels 125 that may be used to read/write data to/from the memory devices 126, transmit commands to the memory devices 126, receive status and statistics from the memory devices 126, and so forth. The management unit 135 may couple the memory controller 100 to external circuitry or external devices, such as the host 103 that may generate requests to read or write data to and/or from the memory devices, by initializing and/or configuring the memory controller 100 and/or the corresponding memory device 126. The management unit 135 is configured to recognize commands received from the host 103 and execute instructions that apply specific operation codes associated with the received host commands to each of the plurality of channels coupled to the memory device 126.

The back end portion 119 includes a media controller portion including a plurality of media controllers 120 and a PHY layer portion including a plurality of PHY interfaces 122. In some embodiments, back-end portion 119 is configured to couple PHY interface 122 to multiple memory levels of memory device 126. The memory rank may be connected to the memory controller 100 via a plurality of channels 125. The respective media controller 120 and corresponding PHY interface 122 may drive the channel 125 to a memory level. In some embodiments, each media controller 120 may execute commands independent of other media controllers 120. Thus, data may be transferred from one PHY interface 122 to a memory device 126 over a channel 125 without being affected by other PHY interfaces 122 and channels 125.

Each PHY interface 122 may operate according to a PHY layer that couples the memory controller 100 to one or more memory levels in the memory device 126. As used herein, the term PHY layer refers generally to the physical layer in the Open Systems Interconnection (OSI) model of computing systems. The PHY layer may be the first (e.g., lowest) layer of the OSI model and may be used to transfer data over a physical data transmission medium. In some embodiments, the physical data transmission medium may be a plurality of channels 125.

As used herein, the term "memory level" generally refers to a plurality of memory chips (e.g., DRAM memory chips) that may be accessed simultaneously. In some embodiments, the memory levels may be sixty-four (64) bits wide, each memory level may have eight (8) pages. In some embodiments, the page size of the first type of memory device may be greater than the page size of the second type of memory device. However, example embodiments are not limited to a particular width of memory level or page size.

Each media controller 120 may include channel control circuitry 124 and a plurality of group control circuitry 128, wherein a respective one of the plurality of group control circuitry 128 is configured to access a respective group 130 of the plurality of groups on the media device 126 accessed by the respective media controller 120. As described in more detail below, in embodiments of the present disclosure, a memory error detector, or more specifically, a corresponding per-group row hammer mitigation circuitry 132, is configured for each group 120 in each channel.

The stages, channels, and groups may be considered as hardware-dependent logical groupings of storage locations in the media device. The mapping of the stage, channel, and group logical groupings to physical storage devices or rows in the memory device 126 may be preconfigured or, in some embodiments, may be configured by a memory controller in communication with the memory device 126.

In some embodiments, memory controller 100 may be a computing quick link (TM) (CXL) compliant memory system (e.g., the memory system may include a PCIe/CXL interface). CXL is a high-speed Central Processing Unit (CPU) -device and CPU-memory interconnect designed to speed up the execution of next-generation data centers. CXL technology maintains memory consistency between CPU memory space and memory on attached devices, allowing resource sharing for higher performance, reduced software stack complexity, and lower overall system cost.

CXLs are intended to be industry open standard interfaces for high-speed communications, as accelerators are increasingly being used to supplement CPUs to support emerging applications such as artificial intelligence and machine learning. CXL technology builds on the peripheral component interconnect express (PCIe) infrastructure, utilizing PCIe physical and electrical interfaces to provide advanced protocols in the areas of input/output (I/O) protocols, memory protocols (e.g., initially allowing hosts to share memory with accelerators), and coherence interfaces. When the memory controller 100 complies with CXL, the interface management circuitry 108 (including the data link and transaction control 108) may manage the interface 106, which may include a PCIe PHY interface, using the CXL protocol.

According to some embodiments, memory device 126 includes one or more DRAM devices. In some embodiments, the main memory is stored in DRAM cells having a high storage density. DRAM cells lose their state over time. That is, DRAM cells must be refreshed periodically, thus giving the name "dynamic". DRAMs may be described as being organized according to a hierarchy of memory organizations including DIMMs, ranks, banks, and arrays. The DIMMs include a plurality of DRAM chips, the plurality of chips in the DIMM being organized into one or more "ranks". Each chip is formed of multiple "groups". A group is formed by one or more "rows" of the memory cell array. All groups within a stage share all address and control pins. All groups are independent, but in some embodiments only one group of stages is accessible at a time. Because of electrical constraints, few DIMMs can be attached to the bus. The stage helps to increase the capacity of the DIMM.

Multiple DRAM chips are used per access to increase data transfer bandwidth. Multiple groups are provided so that the computing system can handle different requests simultaneously. To maximize density, the array in the set gets larger, the rows get wider, and the row buffer gets wider (64B requests to read 8 KB). Each array provides a single bit to the output pins in one cycle (for high densities, because there are few pins). DRAM chips are generally described as xN, where N refers to the number of output pins; one stage may consist of eight x8DRAM chips (e.g., the data bus is 64 bits). The banks and stages provide memory parallelism, and the memory controller 100 may schedule memory accesses to maximize row buffer hit rates and bank/stage parallelism.

In the embodiment shown in FIG. 1A, the memory device 126 is a Low Power Double Data Rate (LPDDR) LP5 or other similar memory interface. However, embodiments are not so limited, and memory device 126 may include one or more memory media of any memory media type that is subject to a row hammer attack or similar memory attack, such as, but not limited to, the type of DRAM.

Each of the plurality of media controllers 120 may receive the same command and address and drive the plurality of channels 125 substantially simultaneously. By using the same commands and addresses for the plurality of media controllers, each of the plurality of media controllers 120 can perform the same memory operation on the same plurality of memory units using the plurality of channels 125. Each media controller 120 may correspond to a RAID component. As used herein, the term "substantially" means that the characteristic need not be absolute, but rather close enough to achieve the advantage of the characteristic.

For example, the "substantially simultaneous" is not limited to operations performed absolutely simultaneously, and may include timing that is intended to be simultaneous but cannot be precisely simultaneous due to manufacturing limitations. For example, media controllers that are used "substantially simultaneously" may not be started or completed at the same time due to read/write latency that may be exhibited by various interfaces (e.g., LPDDR5 and PCIe). For example, the multiple memory controllers may be utilized such that they write data to the memory device simultaneously, regardless of whether one of the media controllers begins or ends before the other.

In fig. 1A, row hammer mitigation component 132 is implemented at each set of levels in memory controller 100 in system 101. In contrast, in the memory controller 100 'of the system 101' shown in fig. 1B, the row hammer mitigation component 132 is implemented at a per channel level. Alternatively, in the memory controller 100 "in the system 101" shown in fig. 1C, the row hammer mitigation component is implemented at the memory media device level.

Fig. 2A shows a schematic diagram of a memory bank 130 viewed in a DRAM device, such as memory device 126. The illustrated group 130 represents a 10x10 array of cells organized into 10 rows (e.g., row 202) and 10 columns (e.g., column 204). Groups are stored or read one row at a time via a row buffer 206. Each cell in the array is accessed by providing a row address and a column address. Address bus, row access strobe, column access strobe (shown as A, RAS, CAS, respectively, in fig. 2A) are used to access a particular memory location in the array. The row buffer 206 and data or read/write signals are used for data to be read from or stored to a memory location.

In some memory devices, a counter, not shown in fig. 2A, may be associated with a row to track the number of times a row is activated during a particular time interval. For example, a counter may be initialized at the beginning of each refresh interval and incremented each time the row is accessed during a refresh interval. In a conventional perfect tracking implementation, a respective counter is associated with each row. In an example embodiment, the number of counters maintained is much smaller than the total number of rows in a memory device attached to the memory controller.

Fig. 2B shows a flowchart 210 depicting a basic implementation flow of row hammer mitigation. Row hammer mitigation includes two aspects: the first aspect is row hammer detection and the second aspect is response to the detection. Various responses are possible, with a response (e.g., a DRFM response) that instructs the memory device 126 to refresh the victim row being one of the possible responses that mitigate or eliminate the effects of row hammer action. In some cases, the memory controller transmits a refresh command, such as a DRFM response, to the memory device 126 and specifies an aggressor row, and the internal circuitry of the memory device determines the victim row to refresh and refreshes the victim row based on the aggressor row identified by the memory controller.

When a request is received to access a row in memory device 126, which may be referred to in this disclosure as an "aggressor row" (row 207 in FIG. 2A), the row is identified as the next row to activate at operation 212. At operation 214, a value of a counter configured to track a number of accesses to the aggressor row within a predetermined time period is checked. At operation 216, it is determined whether the value of the counter is above RHT. When the aggressor row 207 exceeds RHT, the integrity of the data in one or more rows (referred to as "victim rows"; see rows 208 and 209 in FIG. 2A) that are physically adjacent to the aggressor row 207 is not guaranteed. The RHT may be factory set or may be configured at start-up and may depend on the type of memory media device. If the value is above RHT, then a response is issued at operation 218.

One type of response may be a digital refresh management (DRFM) command to refresh physically adjacent rows (e.g., rows 208 and 209) on either side of the aggressor row 207. When a response is issued at operation 218, the counter of the refreshed victim rows (e.g., rows 208 and 209) may be reset (e.g., set to 0). The number of physically adjacent rows to be refreshed may be preconfigured or may be dynamically determined. After issuing the response at 218, or if it is determined at operation 216 that the aggressor row 207 is not above RHT, at operation 220, row activation of the aggressor row is scheduled and the counter of the row is incremented (e.g., by 1).

As mentioned above, the memory device 126 (e.g., one or more DRAM DIMMs) may be subject to row hammer attacks, and various methods are being used to eliminate or reduce the impact of such attacks. Although conventional techniques of row hammer mitigation currently implemented in memory systems are deficient in terms of energy efficiency and/or space efficiency to the best of the inventors' knowledge, example embodiments of the present disclosure provide a row hammer mitigation technique that provides perfect tracking of row hammer attacks (i.e., does not allow any false negative row hammer detection) in a practical, energy efficient and space-saving manner.

As shown in fig. 3, in some example scenarios where a DRAM memory device is attached to a CXL compliant memory controller, the global rate of row hammer attacks on the memory device may be about 6.25 hundred million attacks per second. Thus, if perfect row hammer detection is implemented at the global level of the attached memory device, the row hammer detector must be configured with enough counters to detect at least as many attacks that occur within one second.

For example, in the example embodiment shown in fig. 1A, if perfect row tracking is to be implemented globally, central controller 110 may be configured with row hammer mitigation circuitry that potentially receives row access information for rows in an attached memory device from media controller 120 at a rate of 6.25 billion per second and communicates mitigation responses (e.g., DRFM) to the respective media controllers 120 as needed.

If per-channel row hammer mitigation is implemented for each media controller 120, then the sum of attack rates that the respective media controller 120 can handle must be at least 6.25 billion/second, such an implementation would be able to track significantly higher row update rates and accordingly using the required space and energy resources, as the resources are configured on a per-channel basis.

Similarly, if per-group row hammer mitigation is implemented in each group controller 128 for each group in the channel, then the sum of attack rates that all group controllers can handle must be at least 6.25 billion/second, such an embodiment would be able to detect significantly higher detection rates and accordingly using the required space and energy resources, as the resources are configured on a per-group basis. Thus, the total amount of space and energy resources required to perform hammer detection at the group level exceeds the total amount of space and energy resources that may be required at the channel level, which in turn exceeds the total amount of space and energy resources of the global level implementation.

Thus, various approaches can be considered to achieve perfect (deterministic) row hammer tracking in a memory controller by accessing multiple rows as one unit (same row on different chips) and thus having only one counter for the group instead of having a counter for each row of the media device.

As mentioned above, memory devices 126, such as DRAMs, may be subject to row hammer attacks, and various methods are being used to eliminate or reduce the impact of such attacks. Although conventional techniques of row hammer mitigation currently implemented in memory systems are deficient in terms of energy efficiency and/or space efficiency to the best of the inventors' knowledge, the present disclosure provides a row hammer mitigation technique that provides perfect tracking of row hammer attacks (i.e., does not allow any false negative row hammer detection) in a practical, energy efficient and space-saving manner.

The motivation for the approach to the solution to row hammer attacks described in this disclosure is that it is very space inefficient to have a separate counter for each media row on the media device, so other implementations with a smaller memory footprint for the counter are needed to provide perfect tracking for row hammer detection. Instead of dedicating a respective row hammer activation counter to each row in the media device, example embodiments in this disclosure alias each counter to represent multiple rows in the media device.

That is, each counter counts a different set of media lines, where the set includes more than one media line. By allocating each counter to track accesses to more than one media line, the total number of counters that need to be maintained may be reduced. In addition, the counter incrementing and threshold testing required by example embodiments may be accomplished very efficiently in three clock cycles. The tradeoff of saving space by having multiple rows per counter rather than one row per counter is that the shared counter may reach RHT more frequently because the counter is now incremented when any of the multiple rows assigned to it is accessed.

However, because row hammer events have been observed to be rare (i.e., cases where the number of accesses to a particular media row exceeds RHT within a refresh interval are rare), example embodiments seek to provide perfect tracking while optimizing space savings and frequent events (e.g., counter increment and threshold test at each row access) at the expense of less frequent events (e.g., counter exceeding RHT).

Aliasing avoids the large space required to track accesses to each row individually. Considering that the memory requirements of each row contain bytes observed by the user, and that the extra bytes required to count accesses in a large (e.g., terabyte level) memory system can increase rapidly, tracking accesses individually on a DRAM behavior basis can increase a large amount of memory space. For example, 16 rows in a row group are aliased into one counter, the number of required counters can be reduced by a factor of 16. Thus, the aliasing technique in the disclosed embodiments amortizes each counter over multiple rows of memory.

Fig. 4 illustrates a logical block diagram of row hammer mitigation component 132 under each set of levels, in accordance with some embodiments. The row hammer mitigation component 132 replicates for each group of attached memory devices 126 accessed by the memory controller 100. As shown in fig. 1A, each media controller 120 accessing media device 126 may have a plurality of row hammer mitigation components 132 such that each group controlled by a channel corresponding to the media controller has a corresponding respective row hammer mitigation component 132.

The row hammer mitigation component 132 includes a row hammer detector 133 and an SRAM 134. The row hammer detector 133 includes circuitry that monitors the corresponding set of row hammer attacks of the memory device 126 and responds appropriately when an attack or potential attack is detected. SRAM 134 is used by row hammer detector 133 to maintain counters and other states associated with the row hammer detection of the corresponding group. Additional desired states associated with row hammer detection may be maintained in the d-flip-flop associated with row hammer detector 133.

In the embodiments primarily described in this disclosure, the plurality of row hammer mitigation components 132 are included in the memory controller 100. Including row hammer mitigation circuitry, such as the plurality of row hammer mitigation components 132, in memory controller 100 is advantageous because all accesses to memory devices 126 protected by row hammer mitigation circuitry flow through memory controller 100.

However, embodiments of the present disclosure are not limited to implementing the plurality of row hammer mitigation components 132 in the memory controller 100. In some embodiments, the plurality of row hammer assemblies 132 of each group may be implemented external to the memory controller. Furthermore, embodiments may not be limited to storing counters in SRAM, and in some embodiments, memory 134 may be a different memory type than SRAM but implementing serial searches of counters.

In some embodiments, rather than implementing a row hammer detector at each set of levels, as shown in fig. 4 and 1A, the row hammer detector may be implemented at each channel level (i.e., with one row hammer detector component 132 per channel 120 on memory controller 100, as shown in fig. 1B, for example) or at the media device level (i.e., with one row hammer component 132 per media device 126 on memory controller 100, as shown in fig. 1C, for example).

Fig. 5 schematically illustrates circuitry 500 for row hammer mitigation using an aliased row activation counter used in the devices shown in fig. 1A-1C, for example, in accordance with some embodiments. In an example embodiment, the row hammer detector component 132 (see, e.g., fig. 1A and 4) includes row activation counter (also referred to as an ACT counter) circuitry 502, a refresh response queue 504, a media command queue 506, and a priority arbiter 508.

In an example embodiment, row hammer component 132, and thus circuitry 500, is implemented on memory controller 100. Circuitry 500 receives a media line access command 510 from host system 103. Some media line access commands 510 may be generated internally to the memory controller 100 (e.g., cache 112 miss). Each media line access command 510 received in circuitry 500 is input 512 to media command queue 506. The media command queue 506 may be implemented as a memory buffer for storing at least one media line access command before the command is provided to a media device 518 (e.g., the media device 126 connected to the memory controller 100 in fig. 1A).

Each media line access command also causes counter circuitry 502 to increment 514 and perform threshold test 516 in parallel. The threshold for testing is referred to herein as a "configured trigger threshold" set equal to RHT or below RHT (e.g., threshold = RHT-1). The efficient incrementing and testing process is described with respect to fig. 7. If threshold test 516 detects that the counter exceeds the configured trigger threshold, one or more refresh commands that provide a row hammer response are input to refresh response queue 504.

For example, row activation of row A (e.g., a row access to row A) enqueues a row activate command for row A in media command queue 506, in parallel with incrementing a counter corresponding to row A and testing against a threshold of circuitry 502. If the counter of row A exceeds the configured threshold, a row hammer event is detected on row A and, in response, one or more refresh commands refreshing adjacent rows of row A are enqueued in refresh response queue 506.

The priority arbiter 508 selects commands to be sent to the memory media device 518 from the commands enqueued in the media command queue 506 and the refresh response queue 504. The arbiter 508 may give priority to the refresh response queue 504 when the next command scheduled for execution in the media command queue 506 and the next command scheduled for execution in the refresh response queue 504 are for the same line in a cycle. In this way, the arbiter can have the opportunity to ensure that the row is refreshed before the next access.

In an example embodiment, if circuitry 502 sets a configured trigger threshold that is at least one less than RHT (i.e., threshold = RHT-1) and an access to row a exceeds the configured trigger threshold, then the access to row a and the corresponding refresh command (resulting from exceeding the configured trigger threshold) may be the next command to be scheduled from command queue 506 and refresh response queue 504, respectively. In the event sequence, by scheduling row A refresh commands before row A row access commands, priority arbiter 508 can ensure that row access commands are responded to with the correct data.

In some example embodiments, row hammer circuit component 132 includes a circuit 502, a refresh response queue 504, a media command queue 506, and a priority arbiter 508. In another embodiment, row hammer circuit component 132 includes a circuit 502, a refresh response queue 504, and a priority arbiter 508. In another embodiment, the media command queue 506 and/or the priority arbiter 508 are implemented external to the row hammer circuit component 132 in the media device.

Fig. 6A schematically illustrates an aliased row activation counter 604 organized in an SRAM (e.g., SRAM 134 of fig. 4) used in an implementation such as that shown in fig. 5, according to some embodiments. The line activation counter 604 tracks N media lines 602 (i.e., media lines corresponding to a line identifier of 0..n, where N is an integer greater than 0).

The row activation counters 604 are a set of aliased counters such that each counter in the set 604 tracks a different subset of two or more rows of the set of rows 602. Map 606 maps the two or more row identifiers for each subset of rows 602 to respective corresponding ones of counters 604. In some embodiments, the mapping may be based on a subset of bits in each row identifier select a counter. In the illustrated embodiment, the aliasing factor is X; that is, the N rows are represented as N/X counters, where X is a multiple of 2. If, for example, x=8 and n=1024, then the number of counters required is N/x=128.

FIG. 6B illustrates a table showing the effect of aliasing on the aliased row activation counter table size according to some example embodiments. The table in fig. 6B shows the number of counter words per aliasing factor 1, 2, 4, 8, 16 and 32 required for 1.28 million rows to be monitored. As shown in the figure, although one-to-one row-counter tracking (i.e., aliasing factor=1) requires more than 1.34 hundred million counter words, the number of counters required for aliasing factor 32 decreases dramatically to about 4 million counters.

Fig. 7 shows a flowchart of a row hammer mitigation process 700 using an aliased row activation counter, according to some example embodiments. Process 700 may be implemented in row hammer mitigation component 132. According to an embodiment, in memory controller 100, a respective row hammer mitigation component 132 corresponding to each of the groups connected to attached media device 126 will have its own circuitry to perform process 700 independently. For example, fig. 4 illustrates a respective row hammer mitigation component 132 implemented for each of a plurality of groups in a channel. However, embodiments are not limited to per-group row hammer mitigation. As mentioned above, in some embodiments, row hammer mitigation component 132 is implemented per channel or per memory device.

Process 700 may begin at operation 702, during which a counter table associated with row hammer mitigation is initialized. The counter table has an initial state and must be cleared periodically. In some embodiments, this operation may be triggered/initiated in response to a ping-pong reset operation of a respective refresh (tREF) interval of the memory device 126. Reset will clear all row activation counters. An example ping-pong reset mechanism is described with reference to fig. 8A-8C. Another alternative reset technique is described with respect to fig. 9.

At operation 704, an incoming memory medium access request (row access request) is received for a particular row of memory devices 126 attached to the memory controller 100. The requested row may be referred to as an aggressor row.

At operation 706, a row activation counter, more specifically, a row activation counter corresponding to the row (aggressor row) to be accessed in the media device, is read from SRAM 134. For example, a row activation counter is read from counter table 604 based on the row address of the aggressor row (in row address table 602) and aliasing map 606. In some examples, one counter word is read at a time to increase efficiency (e.g., an implementation may select one of the words read to reduce array error detection and correction (EDAC) overhead).

At operation 708, EDAC is performed and the value of the counter is incremented. Each increment of the counter includes incrementing the value of the counter by 1. Alternatively, the incrementing may be performed in the case of saturation.

At operations 710-712, the new counter value is written back to the counter (i.e., the counter value is updated) while also comparing the value to the configured trigger threshold. In an example embodiment, the configured trigger threshold is set based on the memory medium RHT. For example, in one embodiment, the configured trigger threshold may be set equal to RHT. In another embodiment, the configured trigger threshold is set equal to less than RHT (e.g., RHT-1). Thus, operation 712 determines whether the aggressor row has been accessed multiple times, resulting in that it may now or soon (e.g., in 1 or several accesses) damage an adjacent row.

If it is determined at operation 712 that the aggressor row has been accessed beyond the configured trigger threshold, then at operation 714 a response, such as a signal (e.g., a DRFM command) instructing the memory device to refresh the victim row of the identified aggressor row, is transmitted to the memory device. In some embodiments, the memory device includes circuitry that receives the aggressor row ID and in response refreshes the victim row. In some embodiments, the response may identify a victim row to refresh.

During operation 714, the count value corresponding to the aggressor row ID is set to 0 because the victim row of the aggressor row has been refreshed in response to the transmitted response. In some embodiments, the counter values of the victim rows may also be reset to reflect their refresh in the memory device.

If it is determined at operation 712 that the counter of the aggressor row does not exceed the configured trigger threshold, then the process 700 returns to operation 704 to await the next row identifier. Alternatively, at the end of operation 714, the process 700 returns to operation 704 to await the next row identifier.

It should be noted that process 700 may be implemented in a manner that highly optimizes the most frequent cases of updating and comparing counters. For example, in an example embodiment, each of operations 706 (reading the aliased counter value), 708 (incrementing the counter value), and 710 (updating the counter in SRAM and comparing to the configured trigger threshold) are performed at a rate of one clock cycle, resulting in the three clock cycle times required to complete operations 706-710.

As mentioned above, in normal operating situations, it is relatively rare that the row hammer threshold is exceeded. However, in example embodiments, since the row activation counter is aliased, the configured trigger threshold (which is based on the RHT configuration) may be exceeded slightly more frequently, but is still expected to occur only rarely compared to the more frequent update and comparison events described above, and thus does not negatively impact performance.

Fig. 8A-8C illustrate a ping-pong implementation of initializing an aliasing row counter in an aliasing counter row hammer mitigation implementation such as described with respect to fig. 5 and 7, according to some example embodiments.

Any counter-based logic needs to be reset. Since different memory media lines are refreshed at different internal media determination times, the ideal time to reset the external counter for a particular media line may be configured or determined at startup. Periodic ping-pong resets are a type of secure counter reset that is possible when paired counters are used in a ping-pong scheme. The time between resets of any particular counter may be configurable, and in some embodiments is, for example, 128ms (2 x tREF). According to an embodiment, during the first tREF period, the ACT counter is in a pre-processing state in which it is counting but not sending DRFM. During the next tREF period, the ACT counter is counting and triggering a DRFM command or other row hammer response.

The second set of ACT counters operates similarly, but is offset tREF relative to the first set of ACT counters. A set of ACT counters are always active to trigger the DRFM command.

This approach eliminates false negative DRFM triggers. However, this approach requires two sets of counters, which means twice the silicon area and power. In addition, potential DRFM are sent based on the count of 2x tREF.

Fig. 8A shows an example high-level schematic illustration of a ping-pong reset scheme using two aliased counter tables, table a 802 and table B804. The aliased row activation counter tables 802 and 804 may each have the same structure and counters for the same row.

When an increment event 806 is received for an aggressor row, the counter corresponding to the aggressor row is identified using the aliasing 606 and the corresponding counter is updated in each of tables 802 and 804. Although both tables are updated for each incoming operation, at any given time one of the tables is considered an active table. For comparison 810 with the configured trigger threshold, the value of the aggressor row counter is selected from the hotlist at this time. Based on the result of the comparison 810, a row hammer response 812 may be issued as described above with respect to operation 714.

The row activation counter tables 802 and 804 are referred to as ping pong because the active table state alternates between the two tables. At regular intervals, the reset circuit 808 resets all counters in one of the tables 802 or 804. The most recently reset table is used as a backup table and the other table is used as an active table.

FIG. 8B illustrates a table update procedure during a time period that includes a plurality of refresh cycles 824 identified as cycles I-O. A time staggered manner of updating table a (operational sequence 820) and updating table B (operational sequence 822) is shown. For clarity of explanation, sequence 820 is shown to begin before sequence 822 to indicate that the ith reset of table a occurred in a time staggered manner by one refresh interval 824 before the ith reset of table B. The bottom row in the figure indicates which table is considered the active table in each refresh interval I/O.

Fig. 8C illustrates a schematic diagram of a ping-pong reset circuit 840 according to some embodiments. An external clock serves as an input to the circuit 840. Based on the input clock, the 0-tREF counter 842 generates one pulse per tREF (e.g., 64 ms). Based on the pulse from counter 842, flip-flop 844 switches the output between 0 and 1 every tREF. When ACT counter B848 becomes active (i.e., DRFM can be triggered), ACT counter a 846 resets and vice versa.

ACT counter a 846 and ACT counter B848 always receive and count all memory media line Activations (ACTs). Both ACT counter a and ACT counter B trigger DRFM (or other row hammer mitigation response) when the memory medium RHT count (or configured trigger threshold) is reached, but based on the state of the trigger 844, only one of the ACT counters "active" may trigger DRFM, the other ACT counter being ignored. Component 850 generates a DRFM to a memory media controller for transmission to a memory device.

Fig. 9 illustrates an alternative reset circuit 900 that can safely reset the ACT counter 604 without doubling the amount of space consumed by the counter table required by the ping pong reset circuit 840. According to an embodiment, the reset circuit 900 includes an ACT media line history FIFO 904, based on which an ACT line counter 902 (e.g., an aliasing line counter 604) is reset.

In operation, when a counter increment occurs 916 for row a and is compared to a configured trigger threshold (e.g., operation 710 in process 700 described above), row a is identified Fu Tuiru to FIFO 904 in parallel with updating the counter of row a in aliased ACT counter 902. Similarly, for each counter update, the corresponding line identifier is pushed into FIFO 904. When the FIFO is full, the earliest line identifier pops up. For each row identifier popped from FIFO 904, the corresponding row counter in aliased row counter 902 is reset.

After a refresh command for a row, the row activation count for that row is 0. However, a problem is that the memory controller may not know when which rows are refreshed. The memory controller may know only whether Tref has been passed since the last refresh. One naive solution is to push the line identifier into the FIFO at each refresh. Finally (e.g., 60 seconds later), the row identifier is popped up and the row is refreshed again.

Fig. 10 illustrates a table of different row hammer attack-response techniques that may be implemented in an example embodiment. Refresh techniques include biasing cache replacement, adding refreshes, ACT/Pre neighbor rows, poor dRFM, dRFM, EDAC clear neighbors, restricting row activation of CRHA rows (e.g., rows with high row activation count), responding to row access requests with Data error (Data Bad) messages, alerting and/or interrupting the host to indicate RHT has been exceeded. As shown in the table, several response techniques guarantee data integrity and do not require a repair map of the media device. The assurance of data integrity improves the reliability of reading data from the media device. The elimination of repair maps increases the flexibility of the response technique, the applicability of the response technique to media devices of different manufacturers, and the like. As can be seen in fig. 10, dffm guarantees data integrity, does not require repair maps for the media device, and has low response time, indicating that it is a response technique that can be effectively used in example embodiments. However, embodiments are not limited to using dffm as a response to a row hammer attack.

The SRAM implementation of row activation counters and optionally aliased index from row identifier to counter for row hammer detection makes the aliased row activation counter algorithm efficient in terms of the amount of circuit space required and energy usage. In some embodiments, the disclosed row hammer detector implementations use single port SRAM. Single-port SRAM can be read or written once, but cannot be read and written at the same time, is a cheaper form of SRAM. As mentioned above, since operations 706-710 may be completed in very few clock cycles, a previous access request to the memory media device may be completed before initiating an action to the next access request of the memory media device, allowing a single post-SRAM to be used that allows reading or writing at a given time, but not both. The SRAM implementation of the large number of states required by the algorithm makes it practical to perform row hammer detection on a large number of memory media within the controller, and this in turn enables row hammer detection functions to be removed from the controlled memory media, simplifying and reducing the cost and power consumption of the memory media.

As mentioned above, example embodiments optimize more frequent events associated with row hammer mitigation, i.e., updating and comparing row activation counters, as opposed to the more rare event (in normal operating conditions) that the row activation counter value exceeds the RHT of the memory media device. As described above, for example, the updating and comparing of the row activation counter may be performed in three clock cycles in some embodiments, while the response to a detected row hammer attack may require multiple clock cycles.

The aliased row activation counter implemented in the example embodiments is simple, energy efficient and may save space reasonably. But in some embodiments, because each counter in an embodiment is common to several rows, the system eventually has to do more mitigation actions than in implementations with a single counter per row. The mitigation command is typically time consuming and can reach RHT multiple times faster (e.g., reaching an aliasing factor). Furthermore, each time the RHT is exceeded, the number of mitigation messages sent will be greater (i.e., the more rows in the group means the more neighbors). The reason is that under normal operating conditions, the counters of any row group should not be exceeded, so that there is little trigger mitigation. Example embodiments aim to inexpensively detect abnormal situations. However, since row hammer attacks may occur, a secondary objective of example embodiments is to minimize the impact on performance when a row hammer attack is detected.

Additionally, in example embodiments, upon detection of a row hammer attack, more row hammer response refresh messages may be sent to the memory media device than implementations with separate counters per row. Although having more than one row associated with a counter that exceeds RHT does not mean that the number of affected rows (i.e., the number of adjacent rows) is greater than the one-row per counter implementation, example embodiments may generate a refresh response for each row associated with a counter because the memory controller may not be aware of the mapping of the row identifier of the memory media device to the actual physical row layout.

By overlapping the aliased group with the refresh group, the efficiency of the aliased group in example embodiments may be improved. For example, the memory media device may be configured to perform the refresh of all of its rows by refreshing a respective different group of rows at each of a plurality of refresh times. When the aliased group (a different subgroup of rows mapped to each row activation counter) overlaps with the refresh group, the counting process is optimized because all rows in each group have the same refresh interval.

Example embodiments may limit waterfall and domino attacks in which an attacker may destroy the user's data by setting row updates such that several adjacent groups are just below a configured trigger threshold, so that when one group exceeds the threshold, its mitigation may cause the other group to exceed the threshold, and so on, resulting in a waterfall and domino attack.

Furthermore, in example embodiments, by ensuring that the configured trigger threshold is less than RHT such that media lines are always refreshed just prior to receiving a media access request that would exceed RHT, the worst case denial of service (DoS) latency impact on subsequent memory accesses may be limited to the duration required for only one row hammer response.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

It should be noted that the methods described above describe possible embodiments, and that operations and steps may be rearranged or otherwise modified, and that other embodiments are possible. Furthermore, portions from two or more of the methods may be combined.

The information and signals described herein may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof. Some figures may illustrate signals as a single signal; however, a signal may represent a signal bus, where the bus may have a variety of bit widths.

The terms "transmit," "connect," and "couple" may refer to the relationship between components supporting signal flow between the components. Components are considered to be in electrical communication (or conductive contact or connection or coupling) with each other if there are any conductive paths between the components that can support the flow of signals between the components at any time. At any given time, the conductive paths between components that are in electronic communication (or in conductive contact, connection, or coupling) with each other may be open or closed, depending on the operation of the device containing the connected components. The conductive paths between connected components may be direct conductive paths between components or the conductive paths between connected components may be indirect conductive paths, which may include intermediate components such as switches, transistors, or other components.

In some examples, signal flow between connected components may be interrupted for a period of time, for example using one or more intermediate components, such as switches or transistors. The term "coupled" refers to a transition from an open circuit relationship between components (where signals are not currently able to communicate between the components through conductive paths) to a closed circuit relationship between components (where signals are able to communicate between the components through conductive paths). If a component, such as a controller, couples other components together, the component initiates a change that allows signals to flow between the other components through conductive paths that previously did not allow signals to flow.

The terms "if," "when," "based on," or "based at least in part on" are used interchangeably. In some examples, the terms "if," when..times., "based on," or "based at least in part on" are used to describe a connection between conditional actions, conditional processes, or process portions, then these terms may be interchanged.

The term "responsive to" may refer to a condition or action that occurs at least partially, if not completely, as a result of a prior condition or action. For example, a first condition or action may be performed and a second condition or action may occur at least in part as a result of a previous condition or action occurring (whether occurring directly after or after one or more other intermediate conditions or actions occurring after the first condition or action).

In addition, the term "directly responsive" or "directly responsive" may refer to a condition or action that occurs as a direct result of a prior condition or action. In some examples, a first condition or action may be performed and a second condition or action may occur directly as a result of a previous condition or action occurring, regardless of whether other conditions or actions occur. In some examples, a first condition or action may be performed and a second condition or action may occur directly as a result of a previous condition or action occurring such that no other intermediate condition or action occurs between the earlier condition or action and the second condition or action, or a limited number of one or more intermediate steps or actions occur between the earlier condition or action and the second condition or action. Any condition or action described herein as being performed "based on," "at least in part on," or "in response to" some other step, action, event, or condition may additionally or alternatively (e.g., in alternative examples) "be performed in direct response" or "directly in response to" such other condition or action, unless otherwise specified.

The devices discussed herein, including memory arrays, may be formed on semiconductor substrates such as silicon, germanium, silicon-germanium alloys, gallium arsenide, gallium nitride, and the like. In some examples, the substrate is a semiconductor wafer. In some other examples, the substrate may be a silicon-on-insulator (SOI) substrate, such as silicon-on-glass (SOG) or silicon-on-Sapphire (SOP), or an epitaxial layer of semiconductor material on another substrate. The conductivity of the substrate or substrate sub-region may be controlled by doping with various chemicals including, but not limited to, phosphorus, boron or arsenic. Doping may be performed by ion implantation or by any other doping means during the initial formation or growth of the substrate.

The description set forth herein in connection with the appended drawings describes example configurations and is not intended to represent all examples that may be practiced or within the scope of the claims. The term "exemplary" as used herein means "serving as an example, instance, or illustration," and not "preferential" or "preferred over other examples. The detailed description includes specific details that provide an understanding of the described technology. However, these techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described examples.

The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software for execution by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and embodiments are within the scope of the present disclosure and the appended claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hardwired or a combination of any of these. Features that perform functions may also be physically located at various locations including being distributed such that portions of the functions are performed at different physical locations.

For example, the various illustrative blocks and components described in connection with the disclosure herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. A processor may be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).

As used herein, including in the claims, "or" as used in a list of items (e.g., a list of items ending with phrase "at least one of" or "one or more of" etc.) means an inclusive list, e.g., a list of at least one of A, B or C means a or B or C or AB or AC or BC or ABC (i.e., a and B and C). Furthermore, as used herein, the phrase "based on" should not be understood as a reference to a closed set of conditions. For example, exemplary steps described as "based on condition a" may be based on condition a and condition B without departing from the scope of the present disclosure. In other words, as used herein, the phrase "based on" should be interpreted in the same manner as the phrase "based at least in part on".

Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. The non-transitory storage medium may be any available medium that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, non-transitory computer-readable media can comprise RAM, ROM, electrically erasable programmable read-only memory (EEPROM), compact Disk (CD) ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to carry or store desired program code means in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer or general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium.

For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes CD, laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of computer-readable media.

It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms "a," "an," and "the" may include the singular and plural referents unless the context clearly dictates otherwise. In addition, "a number," "at least one," and "one or more" (e.g., a number of memory banks) may refer to one or more memory banks, while "a number" means more than one such thing.

Furthermore, the use of the terms "may" and "may" in the present application is in a permissive sense (i.e., having the potential to, being able to), rather than the mandatory sense (i.e., must). The term "include" and its derivatives refer to "include, but are not limited to. The term "coupled/coupled" refers to a direct or indirect physical connection, or for accessing and moving (transmitting) commands and/or data, depending on the context. The terms "data" and "data value" are used interchangeably herein and may have the same meaning depending on the context.

The DRAM is organized as an array of memory cells, with each cell storing a programmed value. As mentioned above, a cell may lose its programmed value if not periodically refreshed. Thus, the rows are refreshed at fixed intervals, which interval is commonly referred to as a "refresh interval". Refresh is also referred to as "row activation". In row activation, a row in a DRAM device is read, error corrected, and written back to the same physical row. In recent DRAM devices, data corruption caused by a "row hammer event" (also referred to as a "row hammer attack) is a significant risk.

A row hammer event occurs when a particular row in the media device is accessed too many times, i.e., a "Row Hammer Threshold (RHT)" number of times, within an "activation interval" (i.e., an interval between two refresh/activation events). In particular, when a particular row ("aggressor row") is accessed more than RHT times during an activation interval, one or more rows in the DRAM media that are physically adjacent to the particular row ("victim rows") may be affected by frequent activation of the particular row and data corruption of the one or more rows may occur.

Due to various physical effects of shrinking geometry in manufacturing processes, the RHT of memory devices has been reduced to a level at which even normal computer system programs may inadvertently corrupt their own data or data of another program sharing the same system memory. Traditional row hammer detection techniques are either practical but not perfect, resulting in data corruption or severe performance degradation, or perfect but the required resource costs are impractical.

Traditional row hammer detector algorithms, such as "address sampling" and "priority CAM" (priority content addressable memory), are probabilistic and therefore cannot guarantee perfect (i.e., complete, accurate, and precise) protection against data corruption in any and all row hammer scenarios. If an aggressor (e.g., a malicious attacker) is sufficiently aware of the details of these traditional row hammer detection methods and their implementations, the aggressor can attack their vulnerabilities to bypass or destroy them and corrupt the data.

The "direct" or "perfect" row tracking method is a known perfect row hammer detection algorithm in which a counter is maintained for each row in the DRAM medium, but its implementation requires a large amount of memory and operating power, which are too high to be practical.

Guaranteed row hammer event cancellation is attractive for any memory device, but is particularly attractive for systems such as very large scale data centers (HSDC). In HSDC, a processor and memory are typically shared by multiple clients. A malicious attacker may use a row hammer attack to silently (e.g., without detection) damage other customers' data, potentially increasing their rights to control more system resources or compromising data center security.

Currently, row hammer damage is indistinguishable from other soft errors. Modern workloads can impact processor caches and cause unexpected row hammer situations. When the detected error exceeds a threshold rate, physical repair of the dual in-line memory module (DIMM) is required, which is typically returned to the vendor for compensation.

The description herein is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An apparatus, comprising:

a memory media device interface configured to connect to a memory media device;

at least one memory comprising a plurality of counters, wherein a total number of counters of the plurality of counters is less than a total number of rows in the memory media device that are available for data storage, and each counter of the plurality of counters is associated with a respective group of rows of the memory media device that are available for data storage; and

a first circuit configured to perform operations comprising:

reading a counter of the plurality of counters based on a row identifier of a memory media access request;

incrementing a count value of the counter;

triggering a response to an error in the memory media device if the incremented count value exceeds a configured trigger threshold; and

the incremented count value is written to the counter.

2. The apparatus of claim 1, wherein the at least one memory is a static random access memory, SRAM.

3. The apparatus of claim 2, wherein the at least one memory is a single-port SRAM.

4. The apparatus of claim 1, wherein the total number of counters of the plurality of counters is less than a total number of rows monitored by a memory error detector on the memory media device.

5. The apparatus of claim 1, wherein each of the respective groups of rows in the memory media device that are available for data storage comprises a plurality of rows.

6. The apparatus of claim 5, wherein each of the respective groups of rows in the memory media device that are available for data storage comprises the same number of rows.

7. The apparatus of claim 1, wherein the reading, the incrementing, and the writing are performed together in three consecutive clock cycles.

8. The apparatus of claim 1, further comprising a second circuit, wherein the second circuit is configured to receive a first request to be transmitted to the memory media device and a second request to be transmitted to the memory media device, and to select a request from the first request and the second request to transmit to the memory media device, wherein the first request is a request to read or write data and the second request is a request to refresh one or more rows in the memory media device.

9. The apparatus of claim 8, wherein the second circuit is configured to select the second request when the first request and the second request are associated with a same row.

10. The apparatus of claim 1, wherein the configured trigger threshold is configured to a value below a row hammer threshold of the memory media device.

11. The apparatus of claim 1, wherein the at least one memory includes a second plurality of counters, wherein a total number of counters of the first plurality of counters and the second plurality of counters is the same, and wherein the first circuitry is further configured to increment values of corresponding counters of the first plurality of counters and the second plurality of counters in response to the memory medium access request, and perform a reset of the first plurality of counters in a time-staggered manner relative to resetting the second plurality of counters.

12. The apparatus of claim 1, wherein the at least one memory comprises a first-in-first-out queue having a total number of queue entries equal to a total number of counters of the plurality of counters, and the first circuitry is further configured to input a corresponding counter identifier in the first-in-first-out queue in response to resetting each counter of the plurality of counters, and to reset a corresponding counter of the plurality of counters in response to each counter identifier output from the first-in-first-out queue.

13. The apparatus of claim 1, wherein the triggered response comprises a digital refresh management, DRFM, command to refresh one or more physically adjacent rows of rows corresponding to the row identifier.

14. The apparatus of claim 1, wherein the circuitry is further configured to clear the plurality of counters at respective refresh intervals of the memory media device.

15. The apparatus of claim 14, wherein all rows in each of the respective groups of rows in the memory media device that are available for data storage are refreshed at the same refresh interval.

16. A method, comprising:

configuring a plurality of counters in at least one memory, wherein a total number of counters of the plurality of counters is less than a total number of rows in the memory media device that are available for data storage, and each counter of the plurality of counters is associated with a respective group of rows of the memory media device that are available for data storage;

reading, by a first circuit, a counter of the plurality of counters based on a row identifier of a memory medium access request;

incrementing a count value of the counter by the first circuit;

Triggering, by the first circuit, a response to an error in the memory media device if the incremented count value exceeds a configured trigger threshold; and

the incremented count value is written to the counter by the first circuit.

17. The method of claim 16, wherein the at least one memory is a static random access memory, SRAM.

18. A system, comprising:

a host system;

a memory media device; and

a memory controller, comprising:

a first interface configured to connect to the host system;

a second interface configured to connect to the memory media device;

circuitry configured to perform operations comprising:

Incrementing a count value of the counter;

the incremented count value is written to the counter.

19. The system of claim 18, wherein the at least one memory is a static random access memory, SRAM.