WO2017039948A1 - Memory device error check and scrub mode and error transparency - Google Patents

Memory device error check and scrub mode and error transparency Download PDF

Info

Publication number
WO2017039948A1
WO2017039948A1 PCT/US2016/045640 US2016045640W WO2017039948A1 WO 2017039948 A1 WO2017039948 A1 WO 2017039948A1 US 2016045640 W US2016045640 W US 2016045640W WO 2017039948 A1 WO2017039948 A1 WO 2017039948A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
ecs
error
mode
dram
Prior art date
Application number
PCT/US2016/045640
Other languages
French (fr)
Inventor
John B. Halbert
Kuljit S. Bains
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to EP21193306.4A priority Critical patent/EP3992973A1/en
Priority to EP16842532.0A priority patent/EP3341941B1/en
Priority to CN201680050250.3A priority patent/CN107924705B/en
Publication of WO2017039948A1 publication Critical patent/WO2017039948A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1068Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in sector programmable memories, e.g. flash disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/38Response verification devices
    • G11C29/42Response verification devices using error correcting codes [ECC] or parity check
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/52Protection of memory contents; Detection of errors in memory contents
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/09Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
    • H03M13/095Error detection codes other than CRC and single parity bit codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6566Implementations concerning memory access contentions
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells

Definitions

  • Patent Application No. TBD [P88609], entitled “MEMORY DEVICE CHECK BIT READ MODE”; and, Patent Application No. TBD [P93260], entitled “MEMORY DEVICE ON-DIE ECC (ERROR CHECKING AND CORRECTING) CODE”; both filed concurrently herewith.
  • the descriptions are generally related to memory management, and more particular descriptions are related to error checking and correction in a memory subsystem with a memory device that performs internal error checking and correction.
  • DRAM dynamic random access memory
  • SEC single error correction
  • On-die ECC can be used in addition to system level ECC, but the system level ECC has no insight into what error correction has been performed at the memory device level. Thus, while on-die ECC can handle errors inside a memory device, errors can accumulate undetected by the host system.
  • Figure 1 is a block diagram of an embodiment of a system in which a memory device monitors errors with an error check mode.
  • Figure 2 is a block diagram of an embodiment of a system that performs internal error correction and stores error information.
  • Figure 3 is a block diagram of an embodiment of a system in which a register stores a number of rows with errors, and a maximum error for any row.
  • Figure 4A is a block diagram of an embodiment of command encoding that enables an error check and scrub (ECS) mode.
  • ECS error check and scrub
  • Figure 4B is a block diagram of an embodiment of a mode register that enables an error check and scrub (ECS) mode.
  • ECS error check and scrub
  • Figure 4C is a block diagram of an embodiment of a multipurpose register to store an address of a row with a maximum error count.
  • Figure 4D is a block diagram of an embodiment of a multipurpose register to store a count of a number of rows containing an error.
  • Figure 5 is a block diagram of an embodiment of logic at a memory device that generates error correction information and supports an error check and scrub mode.
  • Figure 6 is a flow diagram of an embodiment of a process for monitoring one or more error counts via error correction operations in an error check and scrub (ECS) mode.
  • ECS error check and scrub
  • Figure 7 is a block diagram of an embodiment of a computing system in which an error check and scrub mode with error tracking can be implemented.
  • Figure 8 is a block diagram of an embodiment of a mobile device in which an error check and scrub mode with error tracking can be implemented.
  • a memory device mode enables error monitoring.
  • the error monitoring mode can be referred to by any label.
  • An error monitoring mode as described herein enables the performing of error checking and correction (ECC) and the counting of a total number of memory segments having errors as well as a segment having the highest number of errors.
  • ECC error checking and correction
  • a segment or portion would be a row of the memory device, where each data chunk within the row (e.g., a prefetch size of data) can be tested for an error.
  • ECC refers herein to any process and/or operation or group of operations to check the validity of stored data based on error checking and correction data, and to perform some form of error correction routine based on the operation or process.
  • the error monitoring mode is an error check and scrub (ECS) mode to enable a DRAM (dynamic random access memory device) to perform one or more ECC operations, and count errors.
  • ECS error check and scrub
  • DRAM dynamic random access memory device
  • a memory controller associated with the memory device triggers the ECS mode with a trigger sent to the memory device.
  • the host can control the mode and receive error count information.
  • the memory device includes multiple addressable memory locations, which can be organized in segments such as wordlines or rows or other portions. The memory locations store data and have associated ECC information.
  • the memory device In the ECS mode, the memory device reads one or more memory locations and performs ECC for one or more memory locations based on ECC information stored within the memory device. Thus, the memory device performs internal ECC in the ECS mode.
  • the memory device counts error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.
  • Memory devices generally refer to volatile memory technologies. Volatile memory is memory whose state (and therefore the data stored on it) is indeterminate if power is interrupted to the device. Nonvolatile memory refers to memory whose state is determinate even if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • a memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (dual data rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007, currently on release 21), DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4, extended, currently in discussion by JEDEC), LPDDR3 (low power DDR version 3, JESD209-3B, Aug 2013 by JEDEC), LPDDR4 (LOW POWER DOUBLE DATA RATE (LPDDR) version 4, JESD209-4, originally published by JEDEC in August 2014), WI02 (Wide I/O 2 (Widel02), JESD229-2, originally published by JEDEC in August 2014), H BM (HIGH), DDR3 (dual data rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007, currently on release 21), DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR
  • BAN DWIDTH MEMORY DRAM JESD235, originally published by JEDEC in October 2013
  • DDR5 DDR version 5
  • LPDDR5 currently in discussion by JEDEC
  • HBM2 HBM version 2
  • JEDEC currently in discussion by JEDEC
  • reference to memory devices can refer to a nonvolatile memory device whose state is determinate even if power is interrupted to the device.
  • the nonvolatile memory device is a block addressable memory device, such as NAND or NOR technologies.
  • a memory device can also include a future generation nonvolatile devices, such as a three dimensional crosspoint memory device, other byte addressable nonvolatile memory devices, or memory devices that use chalcogenide phase change material (e.g., chalcogenide glass).
  • the memory device can be or include multi-threshold level NAN D flash memory, NOR flash memory, single or multi-level Phase Change Memory (PCM), a resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, or spin transfer torque (STT)-MRAM, or a combination of any of the above, or other memory.
  • PCM Phase Change Memory
  • FeTRAM ferroelectric transistor random access memory
  • MRAM magnetoresistive random access memory
  • STT spin transfer torque
  • DRAM dynamic random access memory
  • the memory device or DRAM can refer to the die itself and/or to a packaged memory product.
  • a memory device can count errors and expose the error information to a host system.
  • On-die or internal ECC within a memory device typically refers to error checking and correction of single bit errors (SBEs) in the memory array.
  • SBEs single bit errors
  • a DRAM compatible with DDR4E or a variant or extension can apply ECS combined with error counting or monitoring.
  • ECS single bit errors
  • a DDR4E DRAM can scrub errors and track the error accumulation of the device.
  • a memory device keeps an error count indicating a number of rows that have at least one error, and tracks a row address having the highest number of errors.
  • DDR4E devices incorporate on-die ECC and include rows or wordlines having multiple addressable locations of 128-bit data chunks.
  • the data chunks can have associated ECC bits (e.g., 8 bits of internal error correction data).
  • ECS mode can provide transparency support for such DDR4E devices.
  • ECS mode includes an error check and scrub operation that incorporates an error count mechanism as part of the ECC operation.
  • ECS mode can enable a DRAM to internally read, correct SBEs, and write back corrected data to the array. Such reading, correcting, and writing back can be referred to as "scrubbing" errors.
  • a memory device that supports ECS mode includes one or more registers to store error count information.
  • the registers can be DRAM Mode Registers.
  • the register locations can include multipurpose registers (M PRs) where the memory device can store error count information including a number of segments having errors, and a maximum count of errors and/or an address of a segment having a maximum error count.
  • M PRs multipurpose registers
  • a memory device uses two registers in ECS mode to track code word and check bit errors detected during the ECS mode operation.
  • the memory device stores a value from a Row Error Counter to one register, and stores a value from an Errors per Row Counter to another register.
  • a row error counter tracks the number of rows that have at least a threshold number of code word and check bit errors detected.
  • the threshold number can be one error.
  • the threshold number can be two errors or some other number of errors.
  • the errors per row counter tracks the address of the row with the largest number of code word and check bit errors, and can include the code word and check bit error count for that row.
  • the memory controller reads the error count information stored by the memory device in the registers.
  • the host system e.g., the host operating system and/or the host CPU (central processing unit)
  • the host system can utilize the error count information to improve system-level RAS (reliability, accessibility, and serviceability) of the memory subsystem.
  • RAS reliability, accessibility, and serviceability
  • a memory controller can extract multibit error information from the error information, and apply the multibit information to determine how to apply ECC to the system (e.g., by knowing where errors occurred in the memory).
  • the memory controller uses the error information from the memory device as metadata for improving SDDC (single device data correction) ECC operations targeting multibit errors.
  • FIG. 1 is a block diagram of an embodiment of a system in which a memory device monitors errors with an error check mode.
  • System 100 includes elements of a memory subsystem in a computing device.
  • Processor 110 represents a processing unit of a host computing platform that executes an operating system (OS) and applications, which can collectively be referred to as a "host" for the memory.
  • the OS and applications execute operations that result in memory accesses.
  • Processor 110 can include one or more separate processors. Each separate processor can include a single and/or a multicore processing unit.
  • the processing unit can be a primary processor such as a CPU (central processing unit) and/or a peripheral processor such as a GPU (graphics processing unit).
  • System 100 can be implemented as an SOC, or be implemented with standalone components.
  • Memory controller 120 represents one or more memory controller circuits or devices for system 100. Memory controller 120 represents control logic that generates memory access commands in response to the execution of operations by processor 110. Memory controller 120 accesses one or more memory devices 140. Memory devices 140 can be DRAMs in accordance with any referred to above. In one embodiment, memory devices 140 are organized and ma naged as different channels, where each channel couples to buses and signal lines that couple to multiple memory devices in parallel. Each channel is independently operable. Thus, each channel is independently accessed and controlled, and the timing, data transfer, command and address exchanges, and other operations are separate for each channel. In one embodiment, settings for each channel are controlled by separate mode register or other register settings.
  • each memory controller 120 manages a separate memory channel, although system 100 can be configured to have multiple channels managed by a single controller, or to have multiple controllers on a single channel.
  • memory controller 120 is part of host processor 110, such as logic implemented on the same die or implemented in the same package space as the processor.
  • Memory controller 120 includes I/O interface logic 122 to couple to a system bus.
  • I/O interface logic 122 (as well as I/O 142 of memory device 140) can include pins, connectors, signal lines, and/or other hardware to connect the devices.
  • I/O interface logic 122 can include a hardware interface.
  • I/O interface logic 122 includes at least drivers/transceivers for signal lines. Typically, wires within an integrated circuit interface with a pad or connector to interface to signal lines or traces between devices.
  • I/O interface logic 122 can include drivers, receivers, transceivers, termination, and/or other circuitry to send and/or receive signal on the signal lines between the devices.
  • the system bus can be implemented as multiple signal lines coupling memory controller 120 to memory devices 140.
  • the system bus includes at least clock (CLK) 132, command/address (CMD) 134, data (DQ) 136, and other signal lines 138.
  • the signal lines for CMD 134 can be referred to as a "C/A bus” (or ADD/CM D bus, or some other designation indicating the transfer of commands and address information) and the signal lines for DQ 136 be referred to as a "data bus.”
  • independent channels have different clock signals, C/A buses, data buses, and other signal lines.
  • system 100 can be considered to have multiple "system buses,” in the sense that an independent interface path can be considered a separate system bus.
  • a system bus can include strobe signaling lines, alert lines, auxiliary lines, and other signal lines.
  • the system bus includes a data bus (DQ 136) configured to operate at a bandwidth.
  • DQ 136 can have more or less bandwidth per memory device 140.
  • DQ 136 can support memory devices that have either a x32 interface, a xl6 interface, a x8 interface, or other interface.
  • xN where N is a binary integer refers to an interface size of memory device 140, which represents a number of signal lines DQ 136 that exchange data with memory controller 120.
  • the interface size of the memory devices is a controlling factor on how many memory devices can be used concurrently per channel in system 100 or coupled in parallel to the same signal lines.
  • Memory devices 140 represent memory resources for system 100.
  • each memory device 140 is a separate memory die, which can include multiple (e.g., 2) channels per die.
  • Each memory device 140 includes I/O interface logic 142, which has a bandwidth determined by the implementation of the device (e.g., xl6 or x8 or some other interface bandwidth), and enables the memory devices to interface with memory controller 120.
  • I/O interface logic 142 can include a hardware interface, and can be in accordance with I/O 122 of memory controller, but at the memory device end.
  • multiple memory devices 140 are connected in parallel to the same data buses.
  • system 100 can be configured with multiple memory devices 140 coupled in parallel, with each memory device responding to a command, and accessing memory resources 160 internal to each.
  • each memory device responding to a command, and accessing memory resources 160 internal to each.
  • For a Write operation an individual memory device 140 can write a portion of the overall data word
  • for a Read operation an individual memory device 140 can fetch a portion of the overall data word.
  • memory devices 140 are disposed directly on a
  • memory devices 140 can be organized into memory modules 130.
  • memory modules 130 represent dual inline memory modules (DIMMs).
  • DIMMs dual inline memory modules
  • memory modules 130 represent other organization of multiple memory devices to share at least a portion of access or control circuitry, which can be a separate circuit, a separate device, or a separate board from the host system platform.
  • Memory modules 130 can include multiple memory devices 140, and the memory modules can include support for multiple separate channels to the included memory devices disposed on them.
  • Memory devices 140 each include memory resources 160.
  • Memory resources 160 represent individual arrays of memory locations or storage locations for data. Typically memory resources 160 are managed as rows of data, accessed via cacheline (rows) and bitline (individual bits within a row) control. Memory resources 160 can be organized as separate channels, ranks, and banks of memory. Channels are independent control paths to storage locations within memory devices 140. Ranks refer to common locations across multiple memory devices (e.g., same row addresses within different devices). Banks refer to arrays of memory locations within a memory device 140. In one embodiment, banks of memory are divided into sub-banks with at least a portion of shared circuitry for the sub- banks.
  • memory devices 140 include one or more registers 144.
  • Registers 144 represent storage devices or storage locations that provide configuration or settings for the operation of the memory device.
  • registers 144 can provide a storage location for memory device 140 to store data for access by memory controller 120 as part of a control or management operation.
  • registers 144 include Mode Registers.
  • registers 144 include multipurpose registers. The configuration of locations within register 144 can configure memory device 140 to operate in different "mode," where command and/or address information or signal lines can trigger different operations within memory device 140 depending on the mode.
  • Settings of register 144 can indicate configuration for I/O settings (e.g., timing, termination or ODT (on-die termination), driver configuration, and/or other I/O settings.
  • memory device 140 includes ODT 146 as part of the interface hardware associated with I/O 142.
  • ODT 146 can be configured as mentioned above, and provide settings for impedance to be applied to the interface to specified signal lines. The ODT settings can be changed based on whether a memory device is a selected target of an access operation or a non-target device. ODT 146 settings can affect the timing and reflections of signaling on the terminated lines. Careful control over ODT 146 can enable higher-speed operation with improved matching of applied impedance and loading.
  • Memory device 140 includes controller 150, which represents control logic within the memory device to control internal operations within the memory device. For example, controller 150 decodes commands sent by memory controller 120 and generates internal operations to execute or satisfy the commands. Controller 150 can be referred to as an internal controller. Controller 150 can determine what mode is selected based on register 144, and configure the access and/or execution of operations for memory resources 160 based on the selected mode. Controller 150 generates control signals to control the routing of bits within memory device 140 to provide a proper interface for the selected mode and direct a command to the proper memory locations or addresses.
  • controller 150 represents control logic within the memory device to control internal operations within the memory device. For example, controller 150 decodes commands sent by memory controller 120 and generates internal operations to execute or satisfy the commands. Controller 150 can be referred to as an internal controller. Controller 150 can determine what mode is selected based on register 144, and configure the access and/or execution of operations for memory resources 160 based on the selected mode. Controller 150 generates control signals to control the routing of bits within memory device 140 to provide a proper interface for
  • memory controller 120 includes command (CMD) logic 124, which represents logic or circuitry to generate commands to send to memory devices 140.
  • the signaling in memory subsystems includes address information within or accompanying the command to indicate or select one or more memory locations where the memory devices should execute the command.
  • controller 150 of memory device 140 includes command logic 152 to receive and decode command and address information received via I/O 142 from memory controller 120. Based on the received command and address information, controller 150 can control the timing of operations of the logic and circuitry within memory device 140 to execute the commands. Controller 150 is responsible for compliance with standards or specifications.
  • memory controller 120 includes refresh (REF) logic 126.
  • Refresh logic 126 can be used where memory devices 140 are volatile and need to be refreshed to retain a deterministic state.
  • refresh logic 126 indicates a location for refresh, and a type of refresh to perform.
  • Refresh logic 126 can trigger self- refresh within memory device 140, and/or execute external refreshes by sending refresh commands.
  • system 100 supports all bank refreshes as well as per bank refreshes, or other all bank and per bank commands. All bank commands cause an operation of a selected bank within all memory devices 140 coupled in parallel. Per bank commands cause the operation of a specified bank within a specified memory device 140.
  • controller 150 within memory device 140 includes refresh logic 154 to apply refresh within memory device 140.
  • refresh logic 154 generates internal operations to perform refresh in accordance with an external refresh received from memory controller 120. Refresh logic 154 can determine if a refresh is directed to memory device 140, and what memory resources 160 to refresh in response to the command.
  • memory controller 120 includes error correction and control logic to perform system-level ECC for system 100.
  • System-level ECC refers to application of error correction at memory controller 120, and can apply error correction to data bits from multiple different memory devices 140.
  • memory controller 120 includes ECS 170, which represents circuitry or logic to enable an ECS mode in one or more memory devices 140.
  • ECS 170 includes logic to set a mode register 144 of memory device 140 to trigger ECS mode.
  • ECS 170 includes logic to encode and send a command to trigger the ECS mode in one or more memory devices 140.
  • ECS 170 includes logic to read error information from memory device 140.
  • controller 150 includes ECS logic 156, which represents logic in memory device 140 to enter ECS mode and perform one or more ECS operations.
  • ECS logic 156 can be considered to include internal or on-die ECC logic to perform ECC operations.
  • ECS 156 reads a setting of register 144 to determine if ECS mode is enabled.
  • ECS 156 determines an ECS mode trigger is received via command logic decoding of logic 152.
  • ECS 170 of memory controller 120 sets one or more bits of register 144 to reset error information counts.
  • a DRAM has an on-die or internal oscillator (not specifically shown) to control the timing of internal operations.
  • memory controller 120 issues a series of ECS operation commands for memory device 140 to execute in ECS mode.
  • memory controller 120 simply puts memory device 140 into ECS mode and allows controller 150 to control the operations internally.
  • controller 150 can generate internal commands from the external commands to sequence through the memory locations of memory resources 160 to perform ECC.
  • controller 150 can generate the internal commands to sequence through the memory resources.
  • controller 150 controls at least a portion of the generation of memory location addresses for the ECS operations.
  • FIG. 2 is a block diagram of an embodiment of a system that performs internal error correction and stores error information.
  • System 200 represents components of a memory subsystem.
  • System 200 provides one example of a memory subsystem in accordance with an embodiment of system 100 of Figure 1.
  • System 200 can be included in any type of computing device or electronic circuit that uses memory with internal ECC, where the memory devices count error information.
  • Processor 210 represents any type of processing logic or component that executes operations based on data stored in memory 230 or to store in memory 230.
  • Processor 210 can be or include a host processor, central processing unit (CPU), microcontroller or microprocessor, graphics processor, peripheral processor, application specific processor, or other processor.
  • Processor 210 can be or include a single core or multicore circuit.
  • Memory controller 220 represents logic to interface with memory 230 and manage access to data of memory 230. As with the memory controller above, memory controller 220 can be separate from or part of processor 210. Processor 210 and memory controller 220 together can be considered a "host" from the perspective of memory 230, and memory 230 stores data for the host.
  • memory 230 includes DDR4E DRAMs that have internal ECC (which may be referred to in the industry as DDR4E).
  • system 200 includes multiple memory resources 230.
  • Memory 230 can be implemented in system 200 in any type of architecture that supports access via memory controller 220 with use of internal ECC in the memory.
  • Memory controller 220 includes I/O (input/output) 222, which includes hardware resources to interconnect with corresponding I/O 232 of memory 230.
  • Memory 230 includes command execution 234, which represents control logic within the memory device to receive and execute commands from memory controller 220.
  • the commands can include a series of ECC operations for the memory device to perform in ECS mode to record error count information.
  • mode register 238 includes one or more multipurpose registers to store error count information.
  • mode register 238 includes one or more fields that can be set by memory controller 220 to enable the resetting of the error count information.
  • Memory 230 includes array 240, which represents the array of memory locations where data is stored in the memory device.
  • each address location 244 of array 240 includes associated data and ECC bits.
  • address locations 244 represent addressable chunks of data, such as 128-bit chunks, 64-bit chunks, or 256-bit chunks.
  • address locations 244 are organized as segments or groups of memory locations.
  • memory 230 includes multiple rows 242.
  • each row 242 is a segment or a portion of memory that is checked for errors.
  • rows 242 correspond to memory pages or wordlines.
  • Array 240 includes N rows 242, and rows 242 include M memory locations.
  • address locations 244 correspond to memory words, and rows 242 correspond to memory pages.
  • a page of memory refers to a granular amount of memory space allocated for a memory access operation.
  • array 240 has a larger page size to accommodate the ECC bits in addition to the data bits. Thus, a normal page size would include enough space allocated for the data bits, and array 240 allocates enough space for the data bits plus the ECC bits.
  • memory controller includes ECS control 226 to manage an ECS mode for ECC operation and error counting in memory 230.
  • memory 230 includes internal ECC managed by ECS control 250.
  • Memory controller 220 manages system wide ECC, and can detect and correct errors across multiple different memory resources in parallel (e.g., multiple memory resources 230). Many techniques for system wide ECC are known, and can include managing memory resources in a way to spread errors across multiple parallel resources. By spreading errors across multiple resources, memory controller 220 can recover data even in the event of one or more failures in memory 230.
  • Memory failures are generally categorized as either soft errors or soft failures, which are transient bit errors typically resulting from random environmental conditions, or hard errors or hard failures, which are non-transient bit errors occurring as a result of a hardware failure.
  • ECS control 250 includes a count 252 of rows 242 that include an SBE.
  • ECS control 250 includes one or more counters.
  • SBE count 252 can be one of the counters.
  • ECS control 250 includes ECC logic (not specifically shown) to perform error checking and correction. When an error is detected in a row 242, ECS control 250 increments SBE count 252.
  • ECS control 250 includes max count and/or max address 254. In one embodiment, the max count can be kept in another counter. In one embodiment, the max address refers to an address of the segment or row 242 determined to have the highest number of errors.
  • memory 230 is part of a rank of memory resources.
  • memory includes multiple banks of memory resources, and each bank can be separately accessed. In one embodiment, all banks must be precharged and in the idle state prior to enabling ECS mode.
  • command execution 234 identifies a specific command sequence for ECS mode. In one embodiment, only a specific command sequence is permitted in ECS mode. While different implementations can vary, one example of an allowed command sequence could be as follows: ECS->DES->ACT->DES->WR->DES- ->PRE->DES.
  • NOP could be used in place of the deselect commands (DES), but a NOP command may require toggling a chip select (CS) bit and require the memory to decode the command, whereas a DES command allows the memory to simply remain idle for the cycle.
  • DES deselect commands
  • CS chip select
  • the command sequence can be described as an ECS command trigger, which may not be necessary if ECS mode is triggered via a mode register setting, followed by a Deselect (DES), an Activate (ACT), another DES, a Write (WR), another DES, a Precharge (PRE), and another DES.
  • the ECS command places memory 230 in ECS mode.
  • the ECS command is to be followed by an ACT command tMOD later; the ACT command is to be followed by a WR command tRCD later; and, the WR command is to be followed by a PRE command WL + tWR + 10 CKs later, provided tECSc is satisfied.
  • the minimum time for the ECS Mode period is tECSc (which can be the larger of 45 CKs or 110 ns).
  • tMOD can refer to a timing delay for a mode register set command update
  • tRCD can refer to a timing delay from an ACT command to an internal read or write
  • WL can refer to a write latency
  • tWR can refer to a write recovery time
  • CK can refer to a clock cycle.
  • memory 230 ignores data I/O and address inputs of I/O 232 while in ECS mode, for a period of tMOD after an ECS command is issued. Ignoring the address inputs can include ignoring bank address (BA) values and bank group (BG) values.
  • I/O control for I/O 232 sets the data I/O and address inputs into a tri-state operation.
  • ECS->ACT->WR sequence (ignoring intervening DES commands), the ECS command will enable the ECS mode, and the ACT command will internally perform a row activation.
  • the row activation can occur for a row determined by an internal ECS Address Counter within ECS control 250 (not specifically shown).
  • the WR command will perform an internal Read Modify Write cycle for the activated row, with a column address determined by an internal ECS Address Counter within ECS control 250.
  • ECS control 250 can be part of an internal controller (such as controller 150 of system 100).
  • counters and control within ECS control 250 can be part of logic within an internal controller.
  • separate ECS control logic such as a separate logic circuit or microcontroller to implement the ECS operations.
  • the internal Read and Write cycle or Read Modify Write cycle reads the entire code word and check bits from array 240 (e.g., 128 data bits and 8 check bits), corrects an SBE in the code word or check bits, and writes the resultant code word and check bits back to the appropriate row 242 in array 240.
  • a PRE command exits the ECS mode and returns memory 230 to normal mode.
  • ECS control 250 includes logic to internally generate address information for performing ECS operations.
  • an ECS mode sequences through an entire memory area, such as a bank group.
  • ECS control 250 keeps track of the addresses for sequencing through memory 230.
  • memory controller 220 keeps track of certain address information and memory 230 keeps track of other address information. For example, consider an implementation where memory controller 220 identifies a bank group and memory 230 keeps track of row addresses and column addresses within the bank group. Alternatively, consider that memory controller 220 identifies a bank group and column, and memory 230 identifies the row. Many other implementations are possible.
  • the memory controller is responsible to provide enough ECC operation commands, and the memory device performs all addressing internally.
  • one ECS command causes an ECC operation on a single memory location, and the memory controller sends a sequence of ECS commands to perform ECC operations on many locations.
  • memory 230 counts the number of errors during the ECS mode operations.
  • the error count information can include the total number of rows or other segments that have errors, and the highest error count for any segment.
  • Figure 3 is a block diagram of an embodiment of a system in which a register stores a number of rows with errors, and a maximum error for any row.
  • System 300 illustrates components of an ECC system that keeps counts of error information, and implements an example of ECS in accordance with an embodiment of system 100 of Figure 1 and/or system 200 of Figure 2.
  • System 300 provides a representation of components to perform ECS mode operations.
  • the registers that store the end count are registers accessible to the host, and other components are within a controller on the memory device.
  • Address control 320 represents control logic to enable a memory device to manage internal addressing for ECS operations. As previously described, the memory device may be responsible for certain address information and the memory controller responsible for other address information. Thus, system 300 can have different implementations depending on the address information responsibility of the memory device.
  • address control 320 includes one or more ECS address counters to increment addresses for ECS commands. Address control 320 can include counters for column addresses, row addresses, bank addresses, bank group addresses, and/or a combination of these. In one embodiment, address control 320 includes a counter to increment the column address for a row or other segment for an ECS operation.
  • address control 320 increments the column address on each ECS WR command, which can be triggered or indicated by ANDing an internal ECS command signal (ECS CMD) with an internal write command as indicated by an internal column address strobe for a write (CAS WR), as illustrated with logic 314.
  • ECS CMD ECS command signal
  • the internal signals refer to signal generated internally within a memory device by an internal controller, rather than specifying a command received from an external memory controller.
  • the internal commands are generated in response to external commands from the host.
  • the row counter will sequentially increment and read each code word and associated check bits in the next row until all of the rows within a bank are accessed.
  • the bank counter will sequentially increment and the next sequential bank within a bank group will repeat the process of accessing each code word and associated check bits until all banks and bank groups within the memory are accessed.
  • column, row, bank, bank group, and/or other counter to generate internal addresses in address control 320 can be reset in response to a RESET condition for the memory subsystem and/or in response to a bit in a mode register (MRx Ay bit), as illustrated with logic 312. A reset can occur on a power cycle or other system restart.
  • the address information generated by address control 320 can be used to control decode logic (e.g., row and column decoder circuits within the memory device) to trigger the ECS operations.
  • system 300 includes multiplexer 370 to select between external address 372 and internal address information from address control 320.
  • control for mux 370 can be the ECS mode state, where in ECS mode the internal address generation is selected to access a memory location of the memory array, and when not in ECS mode external address 372 is selected to access the memory location of the memory array.
  • system 300 can be selectively more complex. Based on internal address information and/or external address information, system 300 can check all code words and check bits within the memory. More specifically, system 300 can perform ECS operations, which include ECC operations to read, correct, and write memory locations for all selected or specified addresses. After being operated on once, system 300 can be configured (e.g., via the size of the counters) to wrap the counters and restart the address count. For example, once all memory locations for a memory have been checked, a bank group counter can wrap and begin the process again with the next ECS command. The total number of ECS commands required to complete one cycle of ECS mode is density dependent.
  • Table 1 provides one example of a listing of the number of commands required based on the number of code words for various memory configurations.
  • a DRAM controller keeps track of the number of ECS commands issued to the DRAM to error check all of the code words and check bits in the device.
  • Table 1 Number of Code Words per Bank Group (for 128 bit CW)
  • the number of 128-bit codes words is 67,108,864, which would require an equal number of commands to cycle through all memory locations.
  • the number of 128- bit codes words is 100,663,296.
  • the number of 128-bit codes words is 134,217,728.
  • System 300 includes ECC correction logic 330 to perform error checking and correction operations on selected or addressed memory location(s).
  • logic 330 generates an output flag when an error is detected in a memory location.
  • Logic 330 can generate the flag in response to detecting and correcting a single bit error (SBE) with ECC operations.
  • the error detection signal from logic 330 triggers ERC (error row counter) 332 and errors per row counter 344 to increment.
  • ERC 332 has a threshold number of memory locations that must trigger as having errors in a particular row or segment before counting the row as a row having errors (identified in some places as threshold N). For example, the threshold could be 1 error, and all rows having at least one error are counted.
  • the threshold could be 2 errors, and a row with only one error will not be counted, but all rows having at least 2 errors are counted. It will be understood that the threshold can be set based on system configuration considerations. Thus, the definition of what a "bad row" is can be set by the threshold for ERC 332. If the number of errors is one, ERC 332 may be unnecessary, and the output of ECC logic 330 can be fed directly into row error counter 342.
  • ERC 332 when the threshold number of errors per row has been reached as determined by ERC 332, ERC 332 outputs a signal to cause row error counter 342 to increment its count. Every error detected by logic 330 causes errors per row counter 344 to increment. It will be understood that ERC 332 and errors per row counter 344 generate information on a per-row or per-segment basis; thus, when the row address rolls over, address control 320 can trigger ERC 332 and error per row counter 344 to reset their counts. [0065] Thus, in one embodiment, errors per row counter 344 increments each time a code word or a check bit error is detected and is reset each time the row counter wraps or rolls over.
  • row error counter 342 increments by one. In one embodiment, row error counter 342 increments by one for each two errors (or other number of errors configured) detected in a row.
  • Errors per row counter 344 tracks the total number of code words or check bit errors on a given row.
  • counter 344 provides its count for comparison to a previous high or maximum error count.
  • High error count 352 represents a register or other storage that holds a high error count indicating the maximum number of errors detected in any of the rows.
  • Comparison logic 350 determines if the count value in errors per row counter 344 is greater than high error count 352. In one embodiment, if the current error count in counter 344 is greater than the previous maximum of high error count 352, logic 350 generates an output that triggers high error address 354 to be set to the address of the current row. Additionally, high error count 352 is set to the count from errors per row counter 344. In one embodiment, the row code word or check bit error count is compared to the previous row code word or check bit error count to determine the row address with the highest error count within the DRAM.
  • address control 320 triggers a signal indicating that the last address has been reached or rolled over.
  • the signal triggers row error register 362 to load the count from row error counter 342, and row error counter 342 is reset.
  • system 300 includes a delay in propagating the signal to guarantee that register 362 latches the count from counter 342 prior to the counter being reset.
  • the signal indicating the last address has been reached also triggers error per row register 364 to load the count from high error count 352.
  • the signal indicating the last address has been reached also triggers error per row register 364 to load the address from high error address 354.
  • Figure 4A is a block diagram of an embodiment of command encoding that enables an error check and scrub (ECS) mode.
  • Command encoding 410 represents an example of ECS command encoding in accordance with any embodiment of ECS that triggers an ECS mode with command encoding.
  • ECS command 412 illustrates one example of encoding, where the clock enable bits are set high, as well as ACT_n, CAS_n, and WE_n.
  • Chip select CS_n is set low along with RAS_n.
  • the bank address information (BA and BG) are set low, and the row address bits A11:A0 are held high.
  • A12 is set low. In such an encoding, the DRAM will internally generate the addresses for ECS operations.
  • command 412 can include certain address information.
  • FIG. 4B is a block diagram of an embodiment of a mode register that enables an error check and scrub (ECS) mode.
  • Mode register 420 represents an example of a mode register in a DRAM that supports an ECS mode. In one embodiment, what is illustrated as mode register 420 can be two separate mode registers.
  • address 424 (Az) represents a bit that triggers ECS mode, in in accordance with any embodiment of ECS that triggers an ECS mode with a mode register.
  • Address 422 (Ay) represents a bit that triggers ECS counter reset. In an implementation where command encoding triggers ECS mode, address 424 may not be used, while address 422 could still provide the ability to reset ECS counters.
  • a '1' written to address 424 can trigger ECS mode to cause a memory device to perform ECC operations and count errors, while a '0' can exit ECS mode.
  • an ECS mode requires two additional mode register bits.
  • a first bit is address 422, which enables the clearing of counters and error result registers. The encoding could be reversed, but as illustrated, a '1' can clear the counters and result registers, and a '0' can initialize the registers and counters. In one embodiment, Ay must be written to '0' before a subsequent 1 can be applied to clear the counters and registers.
  • a second bit is not specifically illustrated, and is a mode register bit to enable the result registers. For example, in one embodiment, the mode register bit enables M PR Pages 4 to 7, where results are stored in M PR 4 and 5 (Pages 4 and 5).
  • FIG. 4C is a block diagram of an embodiment of a multipurpose register to store an address of a row with a maximum error count.
  • MPR page 430 represents a register that stores an address of a row with a maximum number of errors.
  • MPR page 430 can be MPR page 4 in a DDR4E implementation.
  • address BA1:BA0 provides an MPR location.
  • '00' can correspond to M PR0
  • '01' can correspond to MPR1
  • '10' can correspond to MPR2
  • '11' can correspond to MPR3.
  • the bits of MPR page 430 are allocated as: A[17:0] is the Row Address, BA[1:0] is the Bank Address, BG[2:0] is the Bank Group Address, and EC[5:0] are the number of code word and check bit errors, limited to a maximum of 64 errors (2 5 ). In one embodiment, only a first row having the bit-maximum number of errors will be recorded in MPR page 430, and if any other rows also have the maximum number of errors, they will not be specifically identified by address.
  • M PR page 430 stores the information for access by the host.
  • MPR page 430 is not automatically cleared after being read by the host, and should be read by the host each time a complete sequence of ECS commands is performed.
  • the host resets the register is reset to zeros by the host prior to a subsequent sequencing through the DRAM.
  • FIG. 4D is a block diagram of an embodiment of a multipurpose register to store a count of a number of rows containing an error.
  • MPR page 440 represents a register that stores an address of a row with a maximum number of errors.
  • MPR page 440 can be MPR page 5 in a DDR4E implementation.
  • address BA1:BA0 provides an MPR location.
  • '00' can correspond to M PRO
  • '01' can correspond to MPR1
  • '10' can correspond to MPR2
  • '11' can correspond to M PR3.
  • M PR page 440 stores the information for access by the host.
  • MPR page 440 is not automatically cleared after being read by the host, and should be read by the host each time a complete sequence of ECS commands is performed.
  • the host resets the register is reset to zeros by the host prior to a subsequent sequencing through the DRAM.
  • Figure 5 is a block diagram of an embodiment of logic at a memory device that generates error correction information and supports an error check and scrub mode.
  • System 500 is one example of ECC component operation for a memory subsystem with a memory device having internal ECC that supports an ECS mode to count error information, in accordance with an embodiment described herein.
  • System 500 provides an example of internal ECC in a DRAM, which generates and stores internal check bits.
  • Host 510 includes a memory controller or equivalent or alternative circuit or component that manages access to memory 520.
  • Host 510 performs external ECC on data read from memory 520.
  • System 500 illustrates write path 532 in memory 520, which represents a path for data written from host 510 to memory 520.
  • Host 510 provides data 542 to memory 520 for writing to the memory array(s).
  • memory 520 generates check bits 544 with check bit generator 522 to store with the data in memory, which can be one example of internal ECC bits used for code word checking/correction.
  • Check bits 544 can enable memory 520 to correct an error that might occur in the writing to and reading from the memory array(s).
  • Data 542 and check bits 544 can be included as code word in 546, which is written to the memory resources. It will be understood that check bits 544 represent internal check bits within the memory device. In one embodiment, there is no write path to check bits 544.
  • Read path 534 represents a path for data read from memory 520 to host 510.
  • memory 520 fetches code word out 552 in response to a Read command from host 510.
  • the code word can include data 554 and check bits 556.
  • Data 554 and check bits 556 can correspond, respectively, to data 542 and check bits 544 written in write path 532, if the address location bits of the write and read commands are the same.
  • error correction in read path 534 can include the application of an XOR (exclusive OR) tree to a corresponding H matrix to detect errors and selectively correct errors (in the case of a single bit error).
  • an H matrix refers to a hamming code parity-check matrix that shows how linear combinations of digits of the codeword equal zero.
  • the H matrix rows identify the coefficients of parity check equations that must be satisfied for a component or digit to be part of a codeword.
  • memory 520 includes syndrome decode 524, which enables the memory to apply check bits 556 to data 554 to detect errors in the read data. Syndrome decode 524 can generate syndrome 558 for use in generating appropriate error information for the read data. Data 554 can also be forwarded to error correction 528 for correction of a detected error.
  • syndrome decode 524 passes syndrome 558 to syndrome generator 526 to generate an error vector.
  • check bit generator 522 and syndrome generator 526 are fully specified by a corresponding H matrix for the memory device.
  • syndrome generator 526 can generate a signal to cause data 554 to be written back to the memory.
  • syndrome generator 526 can generate a CE (corrected error) signal with error location 564, which is a corrected error indication to error correction logic 528.
  • Error correction 528 can apply the corrected error to the specified location in data 554 to generate corrected data 566 for writing back to the memory.
  • the ECC can check and scrub errors in memory locations with ECC operations.
  • FIG. 6 is a flow diagram of an embodiment of a process for monitoring one or more error counts via error correction operations in an error check and scrub (ECS) mode.
  • the process for performing ECS operations can be performed in an embodiment of a memory subsystem that supports ECS mode as described herein.
  • a memory subsystem includes memory controller 602 and memory 604.
  • Memory 604 can be a memory device in accordance with any embodiment described herein.
  • the memory enters ECS mode in response to a trigger. In ECS mode, the memory device checks for errors and corrects them, while counting error information that the memory controller can read.
  • memory controller 602 writes data to the memory device with a write command, 612.
  • memory 604 can generate internal error correction, including generating internal check bits.
  • Memory 604 records or stores the data and corresponding ECC bits, 614.
  • Memory 604 can later use the ECC bits to perform ECC operations, such as ECC operations in an ECS mode in accordance with any embodiment described herein.
  • the memory controller determines to perform an error check and scrub (ECS), 620.
  • ECS error check and scrub
  • Memory controller 602 generates and sends an ECS mode trigger, 622.
  • the ECS mode trigger is an ECS command that is a command sent with specific encoding to cause memory 604 to perform ECS operations.
  • the ECS mode trigger is one or more bits of a mode register.
  • memory controller 602 can perform a mode register set command to place the memory ECS mode.
  • the memory enters ECS mode, 624.
  • the memory generates internal address information for the ECS operations, 626.
  • memory 604 can include an internal controller that manages the ECS operations, including the generation of address information.
  • a controller can include one or more counters to track addresses for sequencing through the memory for ECS operation.
  • the memory generates internal signals to perform the ECS operations, 628.
  • the internal signals can include the specific operations to perform on the address locations for ECS mode, and can include generating and/or decoding address information for sequentially sequencing through the memory.
  • the ECS operations include reading, correcting, and writing back data to a specified memory location.
  • the memory reads one or more memory locations of a portion of the memory (such as a row) and performs error correction on the memory locations, 630.
  • the error correction can be ECC operations based on the stored ECC bits generated and stored with the data.
  • the memory counts the number of portions with errors, 632. In one embodiment, the memory counts an error for every error found within the portion (e.g., within any addressable memory location of the memory portion). In one embodiment, the memory only counts an error for portions that have at least a threshold number of errors (e.g., 2).
  • the memory can count an error for every N errors detected, or increment the count for every portion or segment having at least N errors.
  • the memory counts the number of errors per portion as well as the number of portions having errors, 634.
  • the number of errors per portion indicates a maximum number of errors in any segment.
  • the memory determines if a count of the number of errors per portion exceeds a previous maximum number of errors, 636. If the number of errors does not exceed the maximum, 638 NO branch, the memory can continue to sequence through the memory at 644. If the number of error exceeds the previous maximum, 638 YES branch, in one embodiment, the memory stores the number of errors as the maximum, 640, and stores the address information for the portion with the maximum error count, 642.
  • the memory can determine if there are more memory portions to check and scrub, 644. If there are more portions to check, 646 YES branch, the memory can generate a subsequent address for ECS operations, 626. If there are no more portions to check, 646 NO branch, in one embodiment, the memory stores the error count and maximum error counts, 648. For example, the memory can store the error information in one or more registers to be available for access by memory controller 602. The memory can exit ECS mode after completion of all checking, 650. In one embodiment, the memory controller accesses the error information, 652.
  • FIG. 7 is a block diagram of an embodiment of a computing system in which an error check and scrub mode with error tracking can be implemented.
  • System 700 represents a computing device in accordance with any embodiment described herein, and can be a laptop computer, a desktop computer, a server, a gaming or entertainment control system, a scanner, copier, printer, routing or switching device, or other electronic device.
  • System 700 includes processor 720, which provides processing, operation management, and execution of instructions for system 700.
  • Processor 720 can include any type of
  • Processor 720 controls the overall operation of system 700, and can be or include, one or more programmable general- purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • PLDs programmable logic devices
  • Memory subsystem 730 represents the main memory of system 700, and provides temporary storage for code to be executed by processor 720, or data values to be used in executing a routine.
  • Memory subsystem 730 can include one or more memory devices such as read-only memory (ROM), flash memory, one or more varieties of random access memory (RAM), or other memory devices, or a combination of such devices.
  • Memory subsystem 730 stores and hosts, among other things, operating system (OS) 736 to provide a software platform for execution of instructions in system 700. Additionally, other instructions 738 are stored and executed from memory subsystem 730 to provide the logic and the processing of system 700. OS 736 and instructions 738 are executed by processor 720.
  • Memory subsystem 730 includes memory device 732 where it stores data, instructions, programs, or other items.
  • memory subsystem includes memory controller 734, which is a memory controller to generate and issue commands to memory device 732. It will be understood that memory controller 734 could be a physical part of processor 720.
  • Bus 710 is an abstraction that represents any one or more separate physical buses, communication lines/interfaces, and/or point-to-point connections, connected by appropriate bridges, adapters, and/or controllers. Therefore, bus 710 can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (commonly referred to as "Firewire").
  • PCI Peripheral Component Interconnect
  • ISA HyperTransport or industry standard architecture
  • SCSI small computer system interface
  • USB universal serial bus
  • IEEE Institute of Electrical and Electronics Engineers
  • the buses of bus 710 can also correspond to interfaces in network interface 750.
  • System 700 also includes one or more input/output (I/O) interface(s) 740, network interface 750, one or more internal mass storage device(s) 760, and peripheral interface 770 coupled to bus 710.
  • I/O interface 740 can include one or more interface components through which a user interacts with system 700 (e.g., video, audio, and/or alphanumeric interfacing).
  • Network interface 750 provides system 700 the ability to communicate with remote devices (e.g., servers, other computing devices) over one or more networks.
  • Network interface 750 can include an Ethernet adapter, wireless interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces.
  • Storage 760 can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination.
  • Storage 760 holds code or instructions and data 762 in a persistent state (i.e., the value is retained despite interruption of power to system 700).
  • Storage 760 can be generically considered to be a "memory," although memory 730 is the executing or operating memory to provide instructions to processor 720. Whereas storage 760 is nonvolatile, memory 730 can include volatile memory (i.e., the value or state of the data is indeterminate if power is interrupted to system 700).
  • Peripheral interface 770 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 700. A dependent connection is one where system 700 provides the software and/or hardware platform on which operation executes, and with which a user interacts.
  • memory 732 is a DRAM.
  • processor 720 represents one or more processors that execute data stored in one or more DRAM memories 732.
  • network interface 750 exchanges data with another device in another network location, and the data is data stored in memory 732.
  • system 700 includes ECS control 780 to manage ECS mode operation for the system.
  • ECS control 780 represents ECS logic to perform ECS operations to check and correct errors internally within memory 732, and count error information in accordance with any embodiment described herein.
  • FIG. 8 is a block diagram of an embodiment of a mobile device in which an error check and scrub mode with error tracking can be implemented.
  • Device 800 represents a mobile computing device, such as a computing tablet, a mobile phone or smartphone, a wireless-enabled e-reader, wearable computing device, or other mobile device. It will be understood that certain of the components are shown generally, and not all components of such a device are shown in device 800.
  • Device 800 includes processor 810, which performs the primary processing operations of device 800.
  • Processor 810 can include one or more physical devices, such as microprocessors, application processors, microcontrollers, programmable logic devices, or other processing means.
  • the processing operations performed by processor 810 include the execution of an operating platform or operating system on which applications and/or device functions are executed.
  • the processing operations include operations related to I/O
  • processing operations can also include operations related to audio I/O and/or display I/O.
  • device 800 includes audio subsystem 820, which represents hardware (e.g., audio hardware and audio circuits) and software (e.g., drivers, codecs) components associated with providing audio functions to the computing device. Audio functions can include speaker and/or headphone output, as well as microphone input. Devices for such functions can be integrated into device 800, or connected to device 800. In one embodiment, a user interacts with device 800 by providing audio commands that are received and processed by processor 810.
  • hardware e.g., audio hardware and audio circuits
  • software e.g., drivers, codecs
  • Display subsystem 830 represents hardware (e.g., display devices) and software (e.g., drivers) components that provide a visual and/or tactile display for a user to interact with the computing device.
  • Display subsystem 830 includes display interface 832, which includes the particular screen or hardware device used to provide a display to a user.
  • display interface 832 includes logic separate from processor 810 to perform at least some processing related to the display.
  • display subsystem 830 includes a touchscreen device that provides both output and input to a user.
  • display subsystem 830 includes a high definition (HD) display that provides an output to a user. High definition can refer to a display having a pixel density of
  • PPI pixels per inch
  • formats such as full H D (e.g., 1080p), retina displays, 4K (ultra high definition or UHD), or others.
  • I/O controller 840 represents hardware devices and software components related to interaction with a user. I/O controller 840 can operate to manage hardware that is part of audio subsystem 820 and/or display subsystem 830. Additionally, I/O controller 840 illustrates a connection point for additional devices that connect to device 800 through which a user might interact with the system. For example, devices that can be attached to device 800 might include microphone devices, speaker or stereo systems, video systems or other display device, keyboard or keypad devices, or other I/O devices for use with specific applications such as card readers or other devices.
  • I/O controller 840 can interact with audio subsystem 820 and/or display subsystem 830.
  • input through a microphone or other audio device can provide input or commands for one or more applications or functions of device 800.
  • audio output can be provided instead of or in addition to display output.
  • display subsystem includes a touchscreen
  • the display device also acts as an input device, which can be at least partially managed by I/O controller 840.
  • I/O controller 840 manages devices such as accelerometers, cameras, light sensors or other environmental sensors, gyroscopes, global positioning system (GPS), or other hardware that can be included in device 800.
  • the input can be part of direct user interaction, as well as providing environmental input to the system to influence its operations (such as filtering for noise, adjusting displays for brightness detection, applying a flash for a camera, or other features).
  • device 800 includes power management 850 that manages battery power usage, charging of the battery, and features related to power saving operation.
  • Memory subsystem 860 includes memory device(s) 862 for storing information in device 800.
  • Memory subsystem 860 can include nonvolatile (state does not change if power to the memory device is interrupted) and/or volatile (state is indeterminate if power to the memory device is interrupted) memory devices.
  • Memory 860 can store application data, user data, music, photos, documents, or other data, as well as system data (whether long- term or temporary) related to the execution of the applications and functions of system 800.
  • memory subsystem 860 includes memory controller 864 (which could also be considered part of the control of system 800, and could potentially be considered part of processor 810).
  • Memory controller 864 includes a scheduler to generate and issue commands to memory device 862.
  • Connectivity 870 includes hardware devices (e.g., wireless and/or wired connectors and communication hardware) and software components (e.g., drivers, protocol stacks) to enable device 800 to communicate with external devices.
  • the external device could be separate devices, such as other computing devices, wireless access points or base stations, as well as peripherals such as headsets, printers, or other devices.
  • Connectivity 870 can include multiple different types of connectivity.
  • device 800 is illustrated with cellular connectivity 872 and wireless connectivity 874.
  • Cellular connectivity 872 refers generally to cellular network connectivity provided by wireless carriers, such as provided via GSM (global system for mobile communications) or variations or derivatives, CDMA (code division multiple access) or variations or derivatives, TDM (time division multiplexing) or variations or derivatives, LTE (long term evolution - also referred to as "4G”), or other cellular service standards.
  • Wireless connectivity 874 refers to wireless connectivity that is not cellular, and can include personal area networks (such as Bluetooth), local area networks (such as WiFi), and/or wide area networks (such as WiMax), or other wireless communication.
  • Wireless communication refers to transfer of data through the use of modulated electromagnetic radiation through a non-solid medium. Wired communication occurs through a solid communication medium.
  • Peripheral connections 880 include hardware interfaces and connectors, as well as software components (e.g., drivers, protocol stacks) to make peripheral connections. It will be understood that device 800 could both be a peripheral device ("to” 882) to other computing devices, as well as have peripheral devices ("from” 884) connected to it. Device 800 commonly has a "docking" connector to connect to other computing devices for purposes such as managing (e.g., downloading and/or uploading, changing, synchronizing) content on device 800. Additionally, a docking connector can allow device 800 to connect to certain peripherals that allow device 800 to control content output, for example, to audiovisual or other systems.
  • software components e.g., drivers, protocol stacks
  • device 800 can make peripheral connections 880 via common or standards-based connectors.
  • Common types can include a Universal Serial Bus (USB) connector (which can include any of a number of different hardware interfaces), DisplayPort including
  • MDP MiniDisplayPort
  • HDMI High Definition Multimedia Interface
  • Firewire or other type.
  • memory 862 is a DRAM.
  • processor 810 represents one or more processors that execute data stored in one or more DRAM memories 862.
  • system 800 includes ECS control 890 to manage ECS mode operation for the system.
  • ECS control 890 represents ECS logic to perform ECS operations to check and correct errors internally within memory 862, and count error information in accordance with any embodiment described herein.
  • a dynamic random access memory device includes: a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; I/O (input/output) circuitry to couple to an associated memory controller, the I/O circuitry to receive a trigger for an error check and scrub (ECS) mode when coupled to the associated memory controller; and an internal controller on the DRAM, responsive to the trigger for the ECS mode, to read one or more memory locations, perform ECC for the one or more memory locations based on the ECC information, and count error information, the error information including a segment count indicating a number of segments having N or more errors, and a maximum count indicating a maximum number of errors in any segment.
  • ECC error checking and correction
  • the DRAM includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device (SDRAM).
  • the memory segments comprise DRAM rows.
  • the trigger comprises an ECS command generated by the memory controller.
  • the trigger comprises a mode register setting set by the memory controller.
  • the internal controller is further to generate address information for the memory locations responsive to the trigger for the ECS mode.
  • memory controller is to identify a bank group associated with the trigger for the ECS mode, and wherein the internal controller is further to generate address information for specific rows within the bank group.
  • the internal controller is to perform single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode.
  • SBE single bit error
  • N equals 1. In one embodiment, N equals 2. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the DRAM. In one embodiment, further comprising the internal controller to store an address of a segment having the maximum count, wherein the internal controller is to store the address in a multipurpose register (M PR) of a mode register of the DRAM.
  • MPR multipurpose register
  • a method for error correction management in a memory subsystem includes: receiving a trigger for an error check and scrub (ECS) mode at a memory device having a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; reading one or more of the memory locations responsive to receiving the trigger for the ECS mode; performing ECC for the one or more memory locations based on the ECC information; and counting error information, the error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.
  • ECS error check and scrub
  • the memory device includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device
  • DDR4E double data rate version 4 extended
  • receiving the trigger comprises receiving an ECS command generated by the memory controller. In one embodiment, receiving the trigger comprises receiving a mode register setting set by the memory controller. In one embodiment, further comprising generating address information for the memory locations responsive to the trigger for the ECS mode. In one embodiment, further comprising identifying a bank group associated with the trigger for the ECS mode, and generating address information for specific rows within the bank group. In one embodiment, performing ECC comprises performing single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode. In one embodiment, the threshold number equals 1. In one embodiment, the threshold number equals 2.
  • M PR multipurpose register
  • a system with a memory subsystem includes: a memory controller; and multiple double data rate version 4 extended (DDR4E) synchronous dynamic random access memory devices (SDRAMs) including a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; I/O (input/output) circuitry coupled to the memory controller, the I/O circuitry to receive a trigger for an error check and scrub (ECS) mode from the memory controller; and an internal controller, responsive to the trigger for the ECS mode, to read one or more memory locations, perform ECC for the one or more memory locations based on the ECC information, and count error information, the error information including a segment count indicating a number of segments having N or more errors, and a maximum count indicating a maximum number of errors in any segment.
  • DDR4E double data rate version 4 extended
  • SDRAMs synchronous dynamic random access memory devices
  • I/O (input/output) circuitry coupled to the memory controller, the I/O
  • the DRAM includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device (SDRAM).
  • the memory segments comprise DRAM rows.
  • the trigger comprises an ECS command generated by the memory controller.
  • the trigger comprises a mode register setting set by the memory controller.
  • the internal controller is further to generate address information for the memory locations responsive to the trigger for the ECS mode.
  • memory controller is to identify a bank group associated with the trigger for the ECS mode, and wherein the internal controller is further to generate address information for specific rows within the bank group.
  • the internal controller is to perform single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode.
  • SBE single bit error
  • N equals 1. In one embodiment, N equals 2. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the DRAM. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the SDRAM, and to store an address of a segment having the maximum count. In one embodiment, further comprising one or more of: at least one processor communicatively coupled to the memory controller; a display communicatively coupled to at least one processor; or a network interface communicatively coupled to at least one processor.
  • Flow diagrams as illustrated herein provide examples of sequences of various process actions.
  • the flow diagrams can indicate operations to be executed by a software or firmware routine, as well as physical operations.
  • a flow diagram can illustrate the state of a finite state machine (FSM), which can be implemented in hardware and/or software.
  • FSM finite state machine
  • FIG. 1 Flow diagrams as illustrated herein provide examples of sequences of various process actions.
  • the flow diagrams can indicate operations to be executed by a software or firmware routine, as well as physical operations.
  • a flow diagram can illustrate the state of a finite state machine (FSM), which can be implemented in hardware and/or software.
  • FSM finite state machine
  • a machine readable storage medium can cause a machine to perform the functions or operations described, and includes any mechanism that stores information in a form accessible by a machine (e.g., computing device, electronic system, etc.), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.).
  • a communication interface includes any mechanism that interfaces to any of a hardwired, wireless, optical, etc., medium to communicate to another device, such as a memory bus interface, a processor bus interface, an Internet connection, a disk controller, etc.
  • the communication interface can be configured by providing configuration parameters and/or sending signals to prepare the communication interface to provide a data signal describing the software content.
  • the communication interface can be accessed via one or more commands or signals sent to the communication interface.
  • Various components described herein can be a means for performing the operations or functions described. Each component described herein includes software, hardware, or a combination of these.
  • the components can be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), etc.), embedded controllers, hardwired circuitry, etc.
  • special-purpose hardware e.g., application specific hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), etc.
  • embedded controllers e.g., hardwired circuitry, etc.

Abstract

An error check and scrub (ECS) mode enables a memory device to perform error checking and correction (ECC) and count errors. An associated memory controller triggers the ECS mode with a trigger sent to the memory device. The memory device includes multiple addressable memory locations, which can be organized in segments such as wordlines. The memory locations store data and have associated ECC information. In the ECS mode, the memory device reads one or more memory locations and performs ECC for the one or more memory locations based on the ECC information. The memory device counts error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.

Description

MEMORY DEVICE ERROR CHECK AND SCRUB MODE AND ERROR TRANSPARENCY
RELATED APPLICATIONS
[0001] This patent application is a nonprovisional application based on U.S. Provisional Application No. 62/211,448, filed August 28, 2015. This application claims the benefit of priority of that provisional application. The provisional application is hereby incorporated by reference.
[0002] The present patent application is related to the following two patent
applications, which also claim priority to the same U.S. Provisional Application identified above: Patent Application No. TBD [P88609], entitled "MEMORY DEVICE CHECK BIT READ MODE"; and, Patent Application No. TBD [P93260], entitled "MEMORY DEVICE ON-DIE ECC (ERROR CHECKING AND CORRECTING) CODE"; both filed concurrently herewith.
FIELD
[0003] The descriptions are generally related to memory management, and more particular descriptions are related to error checking and correction in a memory subsystem with a memory device that performs internal error checking and correction.
COPYRIGHT NOTICE/PERMISSION
[0004] Portions of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The copyright notice applies to all data as described below, and in the accompanying drawings hereto, as well as to any software described below: Copyright © 2015, Intel Corporation, All Rights Reserved.
BACKGROUND
[0005] Volatile memory resources find widespread usage in current computing platforms, whether for servers, desktop or laptop computers, mobile devices, and consumer and business electronics. DRAM (dynamic random access memory) devices are the most common types of memory devices in use. DRAM errors are projected to increase as the manufacturing processes to produce the DRAMs continue to scale to smaller geometries. One technique for addressing the increasing DRAM errors is to employ on-die ECC (error checking and correction). On-die ECC refers to error detection and correction logic that resides on the memory device itself. With on-die ECC logic, a DRAM can correct single bit failures, such as through a single error correction (SEC). On-die ECC can be used in addition to system level ECC, but the system level ECC has no insight into what error correction has been performed at the memory device level. Thus, while on-die ECC can handle errors inside a memory device, errors can accumulate undetected by the host system.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The following description includes discussion of figures having illustrations given by way of example of implementations of embodiments of the invention. The drawings should be understood by way of example, and not by way of limitation. As used herein, references to one or more "embodiments" are to be understood as describing a particular feature, structure, and/or characteristic included in at least one implementation of the invention. Thus, phrases such as "in one embodiment" or "in an alternate embodiment" appearing herein describe various embodiments and implementations of the invention, and do not necessarily all refer to the same embodiment. However, they are also not necessarily mutually exclusive.
[0007] Figure 1 is a block diagram of an embodiment of a system in which a memory device monitors errors with an error check mode.
[0008] Figure 2 is a block diagram of an embodiment of a system that performs internal error correction and stores error information.
[0009] Figure 3 is a block diagram of an embodiment of a system in which a register stores a number of rows with errors, and a maximum error for any row.
[0010] Figure 4A is a block diagram of an embodiment of command encoding that enables an error check and scrub (ECS) mode.
[0011] Figure 4B is a block diagram of an embodiment of a mode register that enables an error check and scrub (ECS) mode.
[0012] Figure 4C is a block diagram of an embodiment of a multipurpose register to store an address of a row with a maximum error count.
[0013] Figure 4D is a block diagram of an embodiment of a multipurpose register to store a count of a number of rows containing an error. [0014] Figure 5 is a block diagram of an embodiment of logic at a memory device that generates error correction information and supports an error check and scrub mode.
[0015] Figure 6 is a flow diagram of an embodiment of a process for monitoring one or more error counts via error correction operations in an error check and scrub (ECS) mode.
[0016] Figure 7 is a block diagram of an embodiment of a computing system in which an error check and scrub mode with error tracking can be implemented.
[0017] Figure 8 is a block diagram of an embodiment of a mobile device in which an error check and scrub mode with error tracking can be implemented.
[0018] Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein.
DETAILED DESCRIPTION
[0019] As described herein, a memory device mode enables error monitoring. The error monitoring mode can be referred to by any label. An error monitoring mode as described herein enables the performing of error checking and correction (ECC) and the counting of a total number of memory segments having errors as well as a segment having the highest number of errors. Typically a segment or portion would be a row of the memory device, where each data chunk within the row (e.g., a prefetch size of data) can be tested for an error. ECC refers herein to any process and/or operation or group of operations to check the validity of stored data based on error checking and correction data, and to perform some form of error correction routine based on the operation or process.
[0020] In one embodiment, the error monitoring mode is an error check and scrub (ECS) mode to enable a DRAM (dynamic random access memory device) to perform one or more ECC operations, and count errors. Again, such error monitoring mode can be referred to by any name, but ECS mode is used herein for simplicity by way of example, and is not limiting. A memory controller associated with the memory device triggers the ECS mode with a trigger sent to the memory device. Thus, the host can control the mode and receive error count information. The memory device includes multiple addressable memory locations, which can be organized in segments such as wordlines or rows or other portions. The memory locations store data and have associated ECC information. In the ECS mode, the memory device reads one or more memory locations and performs ECC for one or more memory locations based on ECC information stored within the memory device. Thus, the memory device performs internal ECC in the ECS mode. The memory device counts error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.
[0021] Reference to memory devices can apply to different memory types. Memory devices generally refer to volatile memory technologies. Volatile memory is memory whose state (and therefore the data stored on it) is indeterminate if power is interrupted to the device. Nonvolatile memory refers to memory whose state is determinate even if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory includes DRAM (dynamic random access memory), or some variant such as synchronous DRAM (SDRAM). A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (dual data rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007, currently on release 21), DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4, extended, currently in discussion by JEDEC), LPDDR3 (low power DDR version 3, JESD209-3B, Aug 2013 by JEDEC), LPDDR4 (LOW POWER DOUBLE DATA RATE (LPDDR) version 4, JESD209-4, originally published by JEDEC in August 2014), WI02 (Wide I/O 2 (Widel02), JESD229-2, originally published by JEDEC in August 2014), H BM (HIGH
BAN DWIDTH MEMORY DRAM, JESD235, originally published by JEDEC in October 2013), DDR5 (DDR version 5, currently in discussion by JEDEC), LPDDR5 (currently in discussion by JEDEC), HBM2 (HBM version 2), currently in discussion by JEDEC), and/or others, and technologies based on derivatives or extensions of such specifications.
[0022] In addition to, or alternatively to, volatile memory, in one embodiment, reference to memory devices can refer to a nonvolatile memory device whose state is determinate even if power is interrupted to the device. In one embodiment, the nonvolatile memory device is a block addressable memory device, such as NAND or NOR technologies. Thus, a memory device can also include a future generation nonvolatile devices, such as a three dimensional crosspoint memory device, other byte addressable nonvolatile memory devices, or memory devices that use chalcogenide phase change material (e.g., chalcogenide glass). In one embodiment, the memory device can be or include multi-threshold level NAN D flash memory, NOR flash memory, single or multi-level Phase Change Memory (PCM), a resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, or spin transfer torque (STT)-MRAM, or a combination of any of the above, or other memory.
[0023] Descriptions herein referring to a "DRAM" can apply to any memory device that allows random access, whether volatile or nonvolatile. The memory device or DRAM can refer to the die itself and/or to a packaged memory product.
[0024] Whereas in traditional memory subsystems errors can accumulate undetected within a DRAM because of on-die or internal ECC, as described herein a memory device can count errors and expose the error information to a host system. On-die or internal ECC within a memory device typically refers to error checking and correction of single bit errors (SBEs) in the memory array. In one embodiment, a DRAM compatible with DDR4E or a variant or extension can apply ECS combined with error counting or monitoring. Thus, one embodiment of a DDR4E DRAM can scrub errors and track the error accumulation of the device. In one embodiment, using ECS, a memory device keeps an error count indicating a number of rows that have at least one error, and tracks a row address having the highest number of errors.
[0025] In one embodiment, DDR4E devices incorporate on-die ECC and include rows or wordlines having multiple addressable locations of 128-bit data chunks. The data chunks can have associated ECC bits (e.g., 8 bits of internal error correction data). ECS mode can provide transparency support for such DDR4E devices. In one embodiment, ECS mode includes an error check and scrub operation that incorporates an error count mechanism as part of the ECC operation. ECS mode can enable a DRAM to internally read, correct SBEs, and write back corrected data to the array. Such reading, correcting, and writing back can be referred to as "scrubbing" errors.
[0026] In one embodiment, a memory device that supports ECS mode includes one or more registers to store error count information. For example, the registers can be DRAM Mode Registers. The register locations can include multipurpose registers (M PRs) where the memory device can store error count information including a number of segments having errors, and a maximum count of errors and/or an address of a segment having a maximum error count. In one embodiment, a memory device uses two registers in ECS mode to track code word and check bit errors detected during the ECS mode operation. In one
embodiment, the memory device stores a value from a Row Error Counter to one register, and stores a value from an Errors per Row Counter to another register. In one embodiment, a row error counter tracks the number of rows that have at least a threshold number of code word and check bit errors detected. The threshold number can be one error. The threshold number can be two errors or some other number of errors. In one embodiment, the errors per row counter tracks the address of the row with the largest number of code word and check bit errors, and can include the code word and check bit error count for that row.
[0027] In one embodiment, the memory controller reads the error count information stored by the memory device in the registers. The host system (e.g., the host operating system and/or the host CPU (central processing unit)) can utilize the error count information to improve system-level RAS (reliability, accessibility, and serviceability) of the memory subsystem. Thus, by the memory subsystem providing transparency into the error count information on the memory devices, the memory devices themselves do not need to expose specific error information, but can provide enough information to improve error correction at the system level. In one embodiment, a memory controller can extract multibit error information from the error information, and apply the multibit information to determine how to apply ECC to the system (e.g., by knowing where errors occurred in the memory). In one embodiment, the memory controller uses the error information from the memory device as metadata for improving SDDC (single device data correction) ECC operations targeting multibit errors.
[0028] Figure 1 is a block diagram of an embodiment of a system in which a memory device monitors errors with an error check mode. System 100 includes elements of a memory subsystem in a computing device. Processor 110 represents a processing unit of a host computing platform that executes an operating system (OS) and applications, which can collectively be referred to as a "host" for the memory. The OS and applications execute operations that result in memory accesses. Processor 110 can include one or more separate processors. Each separate processor can include a single and/or a multicore processing unit. The processing unit can be a primary processor such as a CPU (central processing unit) and/or a peripheral processor such as a GPU (graphics processing unit). System 100 can be implemented as an SOC, or be implemented with standalone components.
[0029] Memory controller 120 represents one or more memory controller circuits or devices for system 100. Memory controller 120 represents control logic that generates memory access commands in response to the execution of operations by processor 110. Memory controller 120 accesses one or more memory devices 140. Memory devices 140 can be DRAMs in accordance with any referred to above. In one embodiment, memory devices 140 are organized and ma naged as different channels, where each channel couples to buses and signal lines that couple to multiple memory devices in parallel. Each channel is independently operable. Thus, each channel is independently accessed and controlled, and the timing, data transfer, command and address exchanges, and other operations are separate for each channel. In one embodiment, settings for each channel are controlled by separate mode register or other register settings. In one embodiment, each memory controller 120 manages a separate memory channel, although system 100 can be configured to have multiple channels managed by a single controller, or to have multiple controllers on a single channel. In one embodiment, memory controller 120 is part of host processor 110, such as logic implemented on the same die or implemented in the same package space as the processor.
[0030] Memory controller 120 includes I/O interface logic 122 to couple to a system bus. I/O interface logic 122 (as well as I/O 142 of memory device 140) can include pins, connectors, signal lines, and/or other hardware to connect the devices. I/O interface logic 122 can include a hardware interface. As illustrated, I/O interface logic 122 includes at least drivers/transceivers for signal lines. Typically, wires within an integrated circuit interface with a pad or connector to interface to signal lines or traces between devices. I/O interface logic 122 can include drivers, receivers, transceivers, termination, and/or other circuitry to send and/or receive signal on the signal lines between the devices. The system bus can be implemented as multiple signal lines coupling memory controller 120 to memory devices 140. The system bus includes at least clock (CLK) 132, command/address (CMD) 134, data (DQ) 136, and other signal lines 138. The signal lines for CMD 134 can be referred to as a "C/A bus" (or ADD/CM D bus, or some other designation indicating the transfer of commands and address information) and the signal lines for DQ 136 be referred to as a "data bus." In one embodiment, independent channels have different clock signals, C/A buses, data buses, and other signal lines. Thus, system 100 can be considered to have multiple "system buses," in the sense that an independent interface path can be considered a separate system bus. It will be understood that in addition to the lines explicitly shown, a system bus can include strobe signaling lines, alert lines, auxiliary lines, and other signal lines.
[0031] It will be understood that the system bus includes a data bus (DQ 136) configured to operate at a bandwidth. Based on design and/or implementation of system 100, DQ 136 can have more or less bandwidth per memory device 140. For example, DQ 136 can support memory devices that have either a x32 interface, a xl6 interface, a x8 interface, or other interface. The convention "xN," where N is a binary integer refers to an interface size of memory device 140, which represents a number of signal lines DQ 136 that exchange data with memory controller 120. The interface size of the memory devices is a controlling factor on how many memory devices can be used concurrently per channel in system 100 or coupled in parallel to the same signal lines.
[0032] Memory devices 140 represent memory resources for system 100. In one embodiment, each memory device 140 is a separate memory die, which can include multiple (e.g., 2) channels per die. Each memory device 140 includes I/O interface logic 142, which has a bandwidth determined by the implementation of the device (e.g., xl6 or x8 or some other interface bandwidth), and enables the memory devices to interface with memory controller 120. I/O interface logic 142 can include a hardware interface, and can be in accordance with I/O 122 of memory controller, but at the memory device end. In one embodiment, multiple memory devices 140 are connected in parallel to the same data buses. For example, system 100 can be configured with multiple memory devices 140 coupled in parallel, with each memory device responding to a command, and accessing memory resources 160 internal to each. For a Write operation, an individual memory device 140 can write a portion of the overall data word, and for a Read operation, an individual memory device 140 can fetch a portion of the overall data word.
[0033] In one embodiment, memory devices 140 are disposed directly on a
motherboard or host system platform (e.g., a PCB (printed circuit board) on which processor 110 is disposed) of a computing device. In one embodiment, memory devices 140 can be organized into memory modules 130. In one embodiment, memory modules 130 represent dual inline memory modules (DIMMs). In one embodiment, memory modules 130 represent other organization of multiple memory devices to share at least a portion of access or control circuitry, which can be a separate circuit, a separate device, or a separate board from the host system platform. Memory modules 130 can include multiple memory devices 140, and the memory modules can include support for multiple separate channels to the included memory devices disposed on them.
[0034] Memory devices 140 each include memory resources 160. Memory resources 160 represent individual arrays of memory locations or storage locations for data. Typically memory resources 160 are managed as rows of data, accessed via cacheline (rows) and bitline (individual bits within a row) control. Memory resources 160 can be organized as separate channels, ranks, and banks of memory. Channels are independent control paths to storage locations within memory devices 140. Ranks refer to common locations across multiple memory devices (e.g., same row addresses within different devices). Banks refer to arrays of memory locations within a memory device 140. In one embodiment, banks of memory are divided into sub-banks with at least a portion of shared circuitry for the sub- banks.
[0035] In one embodiment, memory devices 140 include one or more registers 144. Registers 144 represent storage devices or storage locations that provide configuration or settings for the operation of the memory device. In one embodiment, registers 144 can provide a storage location for memory device 140 to store data for access by memory controller 120 as part of a control or management operation. In one embodiment, registers 144 include Mode Registers. In one embodiment, registers 144 include multipurpose registers. The configuration of locations within register 144 can configure memory device 140 to operate in different "mode," where command and/or address information or signal lines can trigger different operations within memory device 140 depending on the mode. Settings of register 144 can indicate configuration for I/O settings (e.g., timing, termination or ODT (on-die termination), driver configuration, and/or other I/O settings.
[0036] In one embodiment, memory device 140 includes ODT 146 as part of the interface hardware associated with I/O 142. ODT 146 can be configured as mentioned above, and provide settings for impedance to be applied to the interface to specified signal lines. The ODT settings can be changed based on whether a memory device is a selected target of an access operation or a non-target device. ODT 146 settings can affect the timing and reflections of signaling on the terminated lines. Careful control over ODT 146 can enable higher-speed operation with improved matching of applied impedance and loading.
[0037] Memory device 140 includes controller 150, which represents control logic within the memory device to control internal operations within the memory device. For example, controller 150 decodes commands sent by memory controller 120 and generates internal operations to execute or satisfy the commands. Controller 150 can be referred to as an internal controller. Controller 150 can determine what mode is selected based on register 144, and configure the access and/or execution of operations for memory resources 160 based on the selected mode. Controller 150 generates control signals to control the routing of bits within memory device 140 to provide a proper interface for the selected mode and direct a command to the proper memory locations or addresses.
[0038] Referring again to memory controller 120, memory controller 120 includes command (CMD) logic 124, which represents logic or circuitry to generate commands to send to memory devices 140. Typically, the signaling in memory subsystems includes address information within or accompanying the command to indicate or select one or more memory locations where the memory devices should execute the command. In one embodiment, controller 150 of memory device 140 includes command logic 152 to receive and decode command and address information received via I/O 142 from memory controller 120. Based on the received command and address information, controller 150 can control the timing of operations of the logic and circuitry within memory device 140 to execute the commands. Controller 150 is responsible for compliance with standards or specifications.
[0039] In one embodiment, memory controller 120 includes refresh (REF) logic 126. Refresh logic 126 can be used where memory devices 140 are volatile and need to be refreshed to retain a deterministic state. In one embodiment, refresh logic 126 indicates a location for refresh, and a type of refresh to perform. Refresh logic 126 can trigger self- refresh within memory device 140, and/or execute external refreshes by sending refresh commands. For example, in one embodiment, system 100 supports all bank refreshes as well as per bank refreshes, or other all bank and per bank commands. All bank commands cause an operation of a selected bank within all memory devices 140 coupled in parallel. Per bank commands cause the operation of a specified bank within a specified memory device 140. In one embodiment, controller 150 within memory device 140 includes refresh logic 154 to apply refresh within memory device 140. In one embodiment, refresh logic 154 generates internal operations to perform refresh in accordance with an external refresh received from memory controller 120. Refresh logic 154 can determine if a refresh is directed to memory device 140, and what memory resources 160 to refresh in response to the command.
[0040] In one embodiment, memory controller 120 includes error correction and control logic to perform system-level ECC for system 100. System-level ECC refers to application of error correction at memory controller 120, and can apply error correction to data bits from multiple different memory devices 140. In one embodiment, memory controller 120 includes ECS 170, which represents circuitry or logic to enable an ECS mode in one or more memory devices 140. In one embodiment, ECS 170 includes logic to set a mode register 144 of memory device 140 to trigger ECS mode. In one embodiment, ECS 170 includes logic to encode and send a command to trigger the ECS mode in one or more memory devices 140. In one embodiment, ECS 170 includes logic to read error information from memory device 140.
[0041] In one embodiment, controller 150 includes ECS logic 156, which represents logic in memory device 140 to enter ECS mode and perform one or more ECS operations. ECS logic 156 can be considered to include internal or on-die ECC logic to perform ECC operations. In one embodiment, ECS 156 reads a setting of register 144 to determine if ECS mode is enabled. In one embodiment, ECS 156 determines an ECS mode trigger is received via command logic decoding of logic 152. In one embodiment, ECS 170 of memory controller 120 sets one or more bits of register 144 to reset error information counts.
[0042] Typically, a DRAM has an on-die or internal oscillator (not specifically shown) to control the timing of internal operations. For a small number of operations, the difference in timing between an internal oscillator and system timing controlled by the host can be synchronized fairly easily. For a longer series of instructions, the timing drift between memory device 140 and the host can create synchronization issues. In one embodiment, memory controller 120 issues a series of ECS operation commands for memory device 140 to execute in ECS mode. In one embodiment, memory controller 120 simply puts memory device 140 into ECS mode and allows controller 150 to control the operations internally. In the case of a series of external operations from memory controller 120, controller 150 can generate internal commands from the external commands to sequence through the memory locations of memory resources 160 to perform ECC. In the case of the memory controller placing memory device 140 in ECS mode, controller 150 can generate the internal commands to sequence through the memory resources. In one embodiment, in either case controller 150 controls at least a portion of the generation of memory location addresses for the ECS operations.
[0043] Figure 2 is a block diagram of an embodiment of a system that performs internal error correction and stores error information. System 200 represents components of a memory subsystem. System 200 provides one example of a memory subsystem in accordance with an embodiment of system 100 of Figure 1. System 200 can be included in any type of computing device or electronic circuit that uses memory with internal ECC, where the memory devices count error information. Processor 210 represents any type of processing logic or component that executes operations based on data stored in memory 230 or to store in memory 230. Processor 210 can be or include a host processor, central processing unit (CPU), microcontroller or microprocessor, graphics processor, peripheral processor, application specific processor, or other processor. Processor 210 can be or include a single core or multicore circuit.
[0044] Memory controller 220 represents logic to interface with memory 230 and manage access to data of memory 230. As with the memory controller above, memory controller 220 can be separate from or part of processor 210. Processor 210 and memory controller 220 together can be considered a "host" from the perspective of memory 230, and memory 230 stores data for the host. In one embodiment, memory 230 includes DDR4E DRAMs that have internal ECC (which may be referred to in the industry as DDR4E). In one embodiment, system 200 includes multiple memory resources 230. Memory 230 can be implemented in system 200 in any type of architecture that supports access via memory controller 220 with use of internal ECC in the memory. Memory controller 220 includes I/O (input/output) 222, which includes hardware resources to interconnect with corresponding I/O 232 of memory 230.
[0045] Memory 230 includes command execution 234, which represents control logic within the memory device to receive and execute commands from memory controller 220. The commands can include a series of ECC operations for the memory device to perform in ECS mode to record error count information. In one embodiment, mode register 238 includes one or more multipurpose registers to store error count information. In one embodiment, mode register 238 includes one or more fields that can be set by memory controller 220 to enable the resetting of the error count information.
[0046] Memory 230 includes array 240, which represents the array of memory locations where data is stored in the memory device. In one embodiment, each address location 244 of array 240 includes associated data and ECC bits. In one embodiment, address locations 244 represent addressable chunks of data, such as 128-bit chunks, 64-bit chunks, or 256-bit chunks. In one embodiment, address locations 244 are organized as segments or groups of memory locations. For example, as illustrated, memory 230 includes multiple rows 242. In one embodiment, each row 242 is a segment or a portion of memory that is checked for errors. In one embodiment, rows 242 correspond to memory pages or wordlines. Array 240 includes N rows 242, and rows 242 include M memory locations.
[0047] In one embodiment, address locations 244 correspond to memory words, and rows 242 correspond to memory pages. A page of memory refers to a granular amount of memory space allocated for a memory access operation. In one embodiment, array 240 has a larger page size to accommodate the ECC bits in addition to the data bits. Thus, a normal page size would include enough space allocated for the data bits, and array 240 allocates enough space for the data bits plus the ECC bits.
[0048] In one embodiment, memory controller includes ECS control 226 to manage an ECS mode for ECC operation and error counting in memory 230. In one embodiment, memory 230 includes internal ECC managed by ECS control 250. Memory controller 220 manages system wide ECC, and can detect and correct errors across multiple different memory resources in parallel (e.g., multiple memory resources 230). Many techniques for system wide ECC are known, and can include managing memory resources in a way to spread errors across multiple parallel resources. By spreading errors across multiple resources, memory controller 220 can recover data even in the event of one or more failures in memory 230. Memory failures are generally categorized as either soft errors or soft failures, which are transient bit errors typically resulting from random environmental conditions, or hard errors or hard failures, which are non-transient bit errors occurring as a result of a hardware failure.
[0049] In one embodiment, ECS control 250 includes a count 252 of rows 242 that include an SBE. In one embodiment, ECS control 250 includes one or more counters. For example, SBE count 252 can be one of the counters. ECS control 250 includes ECC logic (not specifically shown) to perform error checking and correction. When an error is detected in a row 242, ECS control 250 increments SBE count 252. In one embodiment, ECS control 250 includes max count and/or max address 254. In one embodiment, the max count can be kept in another counter. In one embodiment, the max address refers to an address of the segment or row 242 determined to have the highest number of errors.
[0050] In one embodiment, memory 230 is part of a rank of memory resources. In one embodiment, memory includes multiple banks of memory resources, and each bank can be separately accessed. In one embodiment, all banks must be precharged and in the idle state prior to enabling ECS mode. In one embodiment, command execution 234 identifies a specific command sequence for ECS mode. In one embodiment, only a specific command sequence is permitted in ECS mode. While different implementations can vary, one example of an allowed command sequence could be as follows: ECS->DES->ACT->DES->WR->DES- ->PRE->DES. It will be understood that a NOP could be used in place of the deselect commands (DES), but a NOP command may require toggling a chip select (CS) bit and require the memory to decode the command, whereas a DES command allows the memory to simply remain idle for the cycle.
[0051] The command sequence can be described as an ECS command trigger, which may not be necessary if ECS mode is triggered via a mode register setting, followed by a Deselect (DES), an Activate (ACT), another DES, a Write (WR), another DES, a Precharge (PRE), and another DES. In one embodiment, the ECS command places memory 230 in ECS mode. In one embodiment, the ECS command is to be followed by an ACT command tMOD later; the ACT command is to be followed by a WR command tRCD later; and, the WR command is to be followed by a PRE command WL + tWR + 10 CKs later, provided tECSc is satisfied. In one embodiment, the minimum time for the ECS Mode period is tECSc (which can be the larger of 45 CKs or 110 ns). It will be understood that tMOD can refer to a timing delay for a mode register set command update, tRCD can refer to a timing delay from an ACT command to an internal read or write, WL can refer to a write latency, tWR can refer to a write recovery time, and CK can refer to a clock cycle.
[0052] In one embodiment, memory 230 ignores data I/O and address inputs of I/O 232 while in ECS mode, for a period of tMOD after an ECS command is issued. Ignoring the address inputs can include ignoring bank address (BA) values and bank group (BG) values. In one embodiment, I/O control for I/O 232 sets the data I/O and address inputs into a tri-state operation.
[0053] In one embodiment, for an ECS->ACT->WR sequence (ignoring intervening DES commands), the ECS command will enable the ECS mode, and the ACT command will internally perform a row activation. The row activation can occur for a row determined by an internal ECS Address Counter within ECS control 250 (not specifically shown). In one embodiment, the WR command will perform an internal Read Modify Write cycle for the activated row, with a column address determined by an internal ECS Address Counter within ECS control 250. It will be understood that ECS control 250 can be part of an internal controller (such as controller 150 of system 100). Thus, counters and control within ECS control 250 can be part of logic within an internal controller. Alternatively, separate ECS control logic such as a separate logic circuit or microcontroller to implement the ECS operations.
[0054] In one embodiment, the internal Read and Write cycle or Read Modify Write cycle reads the entire code word and check bits from array 240 (e.g., 128 data bits and 8 check bits), corrects an SBE in the code word or check bits, and writes the resultant code word and check bits back to the appropriate row 242 in array 240. In one embodiment, a PRE command exits the ECS mode and returns memory 230 to normal mode.
[0055] As referred to above, in one embodiment, ECS control 250 includes logic to internally generate address information for performing ECS operations. In one embodiment, an ECS mode sequences through an entire memory area, such as a bank group. In one embodiment, ECS control 250 keeps track of the addresses for sequencing through memory 230. In one embodiment, memory controller 220 keeps track of certain address information and memory 230 keeps track of other address information. For example, consider an implementation where memory controller 220 identifies a bank group and memory 230 keeps track of row addresses and column addresses within the bank group. Alternatively, consider that memory controller 220 identifies a bank group and column, and memory 230 identifies the row. Many other implementations are possible. The implementation will depend on the system design, such as whether the address locations are swizzled on memory 230 and the memory controller cannot effectively sequentially run through the check space. In one embodiment, the memory controller is responsible to provide enough ECC operation commands, and the memory device performs all addressing internally. In one embodiment, one ECS command causes an ECC operation on a single memory location, and the memory controller sends a sequence of ECS commands to perform ECC operations on many locations.
[0056] Regardless of whether memory 230 or memory controller 220 identify the addressing, and regardless of whether an ECS command triggers an ECC operation on one or multiple memory locations, memory 230 counts the number of errors during the ECS mode operations. As described, the error count information can include the total number of rows or other segments that have errors, and the highest error count for any segment.
[0057] Figure 3 is a block diagram of an embodiment of a system in which a register stores a number of rows with errors, and a maximum error for any row. System 300 illustrates components of an ECC system that keeps counts of error information, and implements an example of ECS in accordance with an embodiment of system 100 of Figure 1 and/or system 200 of Figure 2. System 300 provides a representation of components to perform ECS mode operations. In one embodiment, the registers that store the end count are registers accessible to the host, and other components are within a controller on the memory device.
[0058] Address control 320 represents control logic to enable a memory device to manage internal addressing for ECS operations. As previously described, the memory device may be responsible for certain address information and the memory controller responsible for other address information. Thus, system 300 can have different implementations depending on the address information responsibility of the memory device. In one embodiment, address control 320 includes one or more ECS address counters to increment addresses for ECS commands. Address control 320 can include counters for column addresses, row addresses, bank addresses, bank group addresses, and/or a combination of these. In one embodiment, address control 320 includes a counter to increment the column address for a row or other segment for an ECS operation.
[0059] In one embodiment, address control 320 increments the column address on each ECS WR command, which can be triggered or indicated by ANDing an internal ECS command signal (ECS CMD) with an internal write command as indicated by an internal column address strobe for a write (CAS WR), as illustrated with logic 314. The internal signals refer to signal generated internally within a memory device by an internal controller, rather than specifying a command received from an external memory controller. The internal commands are generated in response to external commands from the host. In one embodiment, once the column counter wraps or rolls over, the row counter will sequentially increment and read each code word and associated check bits in the next row until all of the rows within a bank are accessed. In one embodiment, once the row counter wraps, the bank counter will sequentially increment and the next sequential bank within a bank group will repeat the process of accessing each code word and associated check bits until all banks and bank groups within the memory are accessed.
[0060] In one embodiment, column, row, bank, bank group, and/or other counter to generate internal addresses in address control 320 can be reset in response to a RESET condition for the memory subsystem and/or in response to a bit in a mode register (MRx Ay bit), as illustrated with logic 312. A reset can occur on a power cycle or other system restart. In one embodiment, the address information generated by address control 320 can be used to control decode logic (e.g., row and column decoder circuits within the memory device) to trigger the ECS operations. However, seeing that for other operations the memory device operates on external address information received from the host, in one embodiment, system 300 includes multiplexer 370 to select between external address 372 and internal address information from address control 320. While not specifically shown, it will be understood that the control for mux 370 can be the ECS mode state, where in ECS mode the internal address generation is selected to access a memory location of the memory array, and when not in ECS mode external address 372 is selected to access the memory location of the memory array.
[0061] When the memory controller provides some address information, the selection logic of system 300 can be selectively more complex. Based on internal address information and/or external address information, system 300 can check all code words and check bits within the memory. More specifically, system 300 can perform ECS operations, which include ECC operations to read, correct, and write memory locations for all selected or specified addresses. After being operated on once, system 300 can be configured (e.g., via the size of the counters) to wrap the counters and restart the address count. For example, once all memory locations for a memory have been checked, a bank group counter can wrap and begin the process again with the next ECS command. The total number of ECS commands required to complete one cycle of ECS mode is density dependent. Table 1 provides one example of a listing of the number of commands required based on the number of code words for various memory configurations. In one embodiment, a DRAM controller keeps track of the number of ECS commands issued to the DRAM to error check all of the code words and check bits in the device.
Figure imgf000019_0001
Table 1: Number of Code Words per Bank Group (for 128 bit CW)
[0062] For 8 Gb devices having x4, x8, or xl6 interfaces, the number of 128-bit codes words is 67,108,864, which would require an equal number of commands to cycle through all memory locations. For 12 Gb devices having x4, x8, or xl6 interfaces, the number of 128- bit codes words is 100,663,296. For 16 Gb devices having x4, x8, or xl6 interfaces, the number of 128-bit codes words is 134,217,728.
[0063] System 300 includes ECC correction logic 330 to perform error checking and correction operations on selected or addressed memory location(s). In one embodiment, logic 330 generates an output flag when an error is detected in a memory location. Logic 330 can generate the flag in response to detecting and correcting a single bit error (SBE) with ECC operations. In one embodiment, the error detection signal from logic 330 triggers ERC (error row counter) 332 and errors per row counter 344 to increment. In one embodiment, ERC 332 has a threshold number of memory locations that must trigger as having errors in a particular row or segment before counting the row as a row having errors (identified in some places as threshold N). For example, the threshold could be 1 error, and all rows having at least one error are counted. As another example, the threshold could be 2 errors, and a row with only one error will not be counted, but all rows having at least 2 errors are counted. It will be understood that the threshold can be set based on system configuration considerations. Thus, the definition of what a "bad row" is can be set by the threshold for ERC 332. If the number of errors is one, ERC 332 may be unnecessary, and the output of ECC logic 330 can be fed directly into row error counter 342.
[0064] As illustrated, when the threshold number of errors per row has been reached as determined by ERC 332, ERC 332 outputs a signal to cause row error counter 342 to increment its count. Every error detected by logic 330 causes errors per row counter 344 to increment. It will be understood that ERC 332 and errors per row counter 344 generate information on a per-row or per-segment basis; thus, when the row address rolls over, address control 320 can trigger ERC 332 and error per row counter 344 to reset their counts. [0065] Thus, in one embodiment, errors per row counter 344 increments each time a code word or a check bit error is detected and is reset each time the row counter wraps or rolls over. In one embodiment, if one or more code word or check bit errors are detected during ECS WR commands on a given row, row error counter 342 increments by one. In one embodiment, row error counter 342 increments by one for each two errors (or other number of errors configured) detected in a row.
[0066] Errors per row counter 344 tracks the total number of code words or check bit errors on a given row. In one embodiment, counter 344 provides its count for comparison to a previous high or maximum error count. High error count 352 represents a register or other storage that holds a high error count indicating the maximum number of errors detected in any of the rows. Comparison logic 350 determines if the count value in errors per row counter 344 is greater than high error count 352. In one embodiment, if the current error count in counter 344 is greater than the previous maximum of high error count 352, logic 350 generates an output that triggers high error address 354 to be set to the address of the current row. Additionally, high error count 352 is set to the count from errors per row counter 344. In one embodiment, the row code word or check bit error count is compared to the previous row code word or check bit error count to determine the row address with the highest error count within the DRAM.
[0067] In one embodiment, after perform ECS operations for all rows in the memory space identified for checking, address control 320 triggers a signal indicating that the last address has been reached or rolled over. In one embodiment, the signal triggers row error register 362 to load the count from row error counter 342, and row error counter 342 is reset. In one embodiment, system 300 includes a delay in propagating the signal to guarantee that register 362 latches the count from counter 342 prior to the counter being reset. In one embodiment, the signal indicating the last address has been reached also triggers error per row register 364 to load the count from high error count 352. In one embodiment, the signal indicating the last address has been reached also triggers error per row register 364 to load the address from high error address 354. In one embodiment, the signal for register 364 to load the address and/or count also triggers high error count 352 to reset. In one embodiment, row error register 362 represents MPR PAGE 5 of a mode register of a DRAM. In one embodiment, row error register 364 represents MPR PAGE 4 of a mode register of a DRAM. [0068] Figure 4A is a block diagram of an embodiment of command encoding that enables an error check and scrub (ECS) mode. Command encoding 410 represents an example of ECS command encoding in accordance with any embodiment of ECS that triggers an ECS mode with command encoding. ECS command 412 illustrates one example of encoding, where the clock enable bits are set high, as well as ACT_n, CAS_n, and WE_n. Chip select CS_n is set low along with RAS_n. In one embodiment, the bank address information (BA and BG) are set low, and the row address bits A11:A0 are held high. In one embodiment, A12 is set low. In such an encoding, the DRAM will internally generate the addresses for ECS operations. Alternatively, command 412 can include certain address information.
[0069] Figure 4B is a block diagram of an embodiment of a mode register that enables an error check and scrub (ECS) mode. Mode register 420 represents an example of a mode register in a DRAM that supports an ECS mode. In one embodiment, what is illustrated as mode register 420 can be two separate mode registers. As illustrated, address 424 (Az) represents a bit that triggers ECS mode, in in accordance with any embodiment of ECS that triggers an ECS mode with a mode register. Address 422 (Ay) represents a bit that triggers ECS counter reset. In an implementation where command encoding triggers ECS mode, address 424 may not be used, while address 422 could still provide the ability to reset ECS counters. In one embodiment, a '1' written to address 424 can trigger ECS mode to cause a memory device to perform ECC operations and count errors, while a '0' can exit ECS mode.
[0070] In one embodiment, an ECS mode requires two additional mode register bits. A first bit is address 422, which enables the clearing of counters and error result registers. The encoding could be reversed, but as illustrated, a '1' can clear the counters and result registers, and a '0' can initialize the registers and counters. In one embodiment, Ay must be written to '0' before a subsequent 1 can be applied to clear the counters and registers. A second bit is not specifically illustrated, and is a mode register bit to enable the result registers. For example, in one embodiment, the mode register bit enables M PR Pages 4 to 7, where results are stored in M PR 4 and 5 (Pages 4 and 5).
[0071] Figure 4C is a block diagram of an embodiment of a multipurpose register to store an address of a row with a maximum error count. MPR page 430 represents a register that stores an address of a row with a maximum number of errors. For example, MPR page 430 can be MPR page 4 in a DDR4E implementation. In one embodiment, address BA1:BA0 provides an MPR location. For example, '00' can correspond to M PR0, '01' can correspond to MPR1, '10' can correspond to MPR2, and '11' can correspond to MPR3. In one embodiment, the bits of MPR page 430 are allocated as: A[17:0] is the Row Address, BA[1:0] is the Bank Address, BG[2:0] is the Bank Group Address, and EC[5:0] are the number of code word and check bit errors, limited to a maximum of 64 errors (25). In one embodiment, only a first row having the bit-maximum number of errors will be recorded in MPR page 430, and if any other rows also have the maximum number of errors, they will not be specifically identified by address.
[0072] In one embodiment, M PR page 430 stores the information for access by the host. In one embodiment, MPR page 430 is not automatically cleared after being read by the host, and should be read by the host each time a complete sequence of ECS commands is performed. In one embodiment, the host resets the register is reset to zeros by the host prior to a subsequent sequencing through the DRAM.
[0073] Figure 4D is a block diagram of an embodiment of a multipurpose register to store a count of a number of rows containing an error. MPR page 440 represents a register that stores an address of a row with a maximum number of errors. For example, MPR page 440 can be MPR page 5 in a DDR4E implementation. In one embodiment, address BA1:BA0 provides an MPR location. For example, '00' can correspond to M PRO, '01' can correspond to MPR1, '10' can correspond to MPR2, and '11' can correspond to M PR3. In one embodiment, the bits of MPR page 450 are allocated as: EC[N-1:0] is the number of rows with at least a threshold (e.g., at least 1 or at least 2) code word or check bit errors, up to a maximum of 2N 1 rows. In one embodiment, N=16 for a maximum of 65,536 rows. As illustrated, N=20. Locations marked RFU can be reserved for future use.
[0074] In one embodiment, M PR page 440 stores the information for access by the host. In one embodiment, MPR page 440 is not automatically cleared after being read by the host, and should be read by the host each time a complete sequence of ECS commands is performed. In one embodiment, the host resets the register is reset to zeros by the host prior to a subsequent sequencing through the DRAM.
[0075] Figure 5 is a block diagram of an embodiment of logic at a memory device that generates error correction information and supports an error check and scrub mode.
System 500 is one example of ECC component operation for a memory subsystem with a memory device having internal ECC that supports an ECS mode to count error information, in accordance with an embodiment described herein. System 500 provides an example of internal ECC in a DRAM, which generates and stores internal check bits. Host 510 includes a memory controller or equivalent or alternative circuit or component that manages access to memory 520. Host 510 performs external ECC on data read from memory 520.
[0076] System 500 illustrates write path 532 in memory 520, which represents a path for data written from host 510 to memory 520. Host 510 provides data 542 to memory 520 for writing to the memory array(s). In one embodiment, memory 520 generates check bits 544 with check bit generator 522 to store with the data in memory, which can be one example of internal ECC bits used for code word checking/correction. Check bits 544 can enable memory 520 to correct an error that might occur in the writing to and reading from the memory array(s). Data 542 and check bits 544 can be included as code word in 546, which is written to the memory resources. It will be understood that check bits 544 represent internal check bits within the memory device. In one embodiment, there is no write path to check bits 544.
[0077] Read path 534 represents a path for data read from memory 520 to host 510. In one embodiment, at least certain hardware components of write path 532 and read path 534 are the same hardware. In one embodiment, memory 520 fetches code word out 552 in response to a Read command from host 510. The code word can include data 554 and check bits 556. Data 554 and check bits 556 can correspond, respectively, to data 542 and check bits 544 written in write path 532, if the address location bits of the write and read commands are the same. It will be understood that error correction in read path 534 can include the application of an XOR (exclusive OR) tree to a corresponding H matrix to detect errors and selectively correct errors (in the case of a single bit error).
[0078] As is understood in the art, an H matrix refers to a hamming code parity-check matrix that shows how linear combinations of digits of the codeword equal zero. Thus, the H matrix rows identify the coefficients of parity check equations that must be satisfied for a component or digit to be part of a codeword. In one embodiment, memory 520 includes syndrome decode 524, which enables the memory to apply check bits 556 to data 554 to detect errors in the read data. Syndrome decode 524 can generate syndrome 558 for use in generating appropriate error information for the read data. Data 554 can also be forwarded to error correction 528 for correction of a detected error.
[0079] In one embodiment, syndrome decode 524 passes syndrome 558 to syndrome generator 526 to generate an error vector. In one embodiment, check bit generator 522 and syndrome generator 526 are fully specified by a corresponding H matrix for the memory device. In one embodiment, if there are no errors in the read data (e.g., zero syndrome 558), syndrome generator 526 can generate a signal to cause data 554 to be written back to the memory. In one embodiment, if there is a single bit error (e.g., non-zero syndrome 558 that matches one of the columns of a corresponding H matrix), syndrome generator 526 can generate a CE (corrected error) signal with error location 564, which is a corrected error indication to error correction logic 528. Error correction 528 can apply the corrected error to the specified location in data 554 to generate corrected data 566 for writing back to the memory. In ECS mode, the ECC can check and scrub errors in memory locations with ECC operations.
[0080] Figure 6 is a flow diagram of an embodiment of a process for monitoring one or more error counts via error correction operations in an error check and scrub (ECS) mode. The process for performing ECS operations can be performed in an embodiment of a memory subsystem that supports ECS mode as described herein. A memory subsystem includes memory controller 602 and memory 604. Memory 604 can be a memory device in accordance with any embodiment described herein. The memory enters ECS mode in response to a trigger. In ECS mode, the memory device checks for errors and corrects them, while counting error information that the memory controller can read.
[0081] In one embodiment, memory controller 602 writes data to the memory device with a write command, 612. In response to the write command, in one embodiment, memory 604 can generate internal error correction, including generating internal check bits. Memory 604 records or stores the data and corresponding ECC bits, 614. Memory 604 can later use the ECC bits to perform ECC operations, such as ECC operations in an ECS mode in accordance with any embodiment described herein.
[0082] In one embodiment, the memory controller determines to perform an error check and scrub (ECS), 620. Memory controller 602 generates and sends an ECS mode trigger, 622. In one embodiment, the ECS mode trigger is an ECS command that is a command sent with specific encoding to cause memory 604 to perform ECS operations. In one embodiment, the ECS mode trigger is one or more bits of a mode register. For example, memory controller 602 can perform a mode register set command to place the memory ECS mode. In response to the ECS mode trigger, the memory enters ECS mode, 624. [0083] In one embodiment, the memory generates internal address information for the ECS operations, 626. For example, memory 604 can include an internal controller that manages the ECS operations, including the generation of address information. Such a controller can include one or more counters to track addresses for sequencing through the memory for ECS operation. The memory generates internal signals to perform the ECS operations, 628. The internal signals can include the specific operations to perform on the address locations for ECS mode, and can include generating and/or decoding address information for sequentially sequencing through the memory.
[0084] In one embodiment, the ECS operations include reading, correcting, and writing back data to a specified memory location. Thus, in one embodiment, the memory reads one or more memory locations of a portion of the memory (such as a row) and performs error correction on the memory locations, 630. The error correction can be ECC operations based on the stored ECC bits generated and stored with the data. The memory counts the number of portions with errors, 632. In one embodiment, the memory counts an error for every error found within the portion (e.g., within any addressable memory location of the memory portion). In one embodiment, the memory only counts an error for portions that have at least a threshold number of errors (e.g., 2). Thus, the memory can count an error for every N errors detected, or increment the count for every portion or segment having at least N errors. In one embodiment, the memory counts the number of errors per portion as well as the number of portions having errors, 634. The number of errors per portion indicates a maximum number of errors in any segment.
[0085] In one embodiment, the memory determines if a count of the number of errors per portion exceeds a previous maximum number of errors, 636. If the number of errors does not exceed the maximum, 638 NO branch, the memory can continue to sequence through the memory at 644. If the number of error exceeds the previous maximum, 638 YES branch, in one embodiment, the memory stores the number of errors as the maximum, 640, and stores the address information for the portion with the maximum error count, 642.
[0086] The memory can determine if there are more memory portions to check and scrub, 644. If there are more portions to check, 646 YES branch, the memory can generate a subsequent address for ECS operations, 626. If there are no more portions to check, 646 NO branch, in one embodiment, the memory stores the error count and maximum error counts, 648. For example, the memory can store the error information in one or more registers to be available for access by memory controller 602. The memory can exit ECS mode after completion of all checking, 650. In one embodiment, the memory controller accesses the error information, 652.
[0087] Figure 7 is a block diagram of an embodiment of a computing system in which an error check and scrub mode with error tracking can be implemented. System 700 represents a computing device in accordance with any embodiment described herein, and can be a laptop computer, a desktop computer, a server, a gaming or entertainment control system, a scanner, copier, printer, routing or switching device, or other electronic device. System 700 includes processor 720, which provides processing, operation management, and execution of instructions for system 700. Processor 720 can include any type of
microprocessor, central processing unit (CPU), processing core, or other processing hardware to provide processing for system 700. Processor 720 controls the overall operation of system 700, and can be or include, one or more programmable general- purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
[0088] Memory subsystem 730 represents the main memory of system 700, and provides temporary storage for code to be executed by processor 720, or data values to be used in executing a routine. Memory subsystem 730 can include one or more memory devices such as read-only memory (ROM), flash memory, one or more varieties of random access memory (RAM), or other memory devices, or a combination of such devices. Memory subsystem 730 stores and hosts, among other things, operating system (OS) 736 to provide a software platform for execution of instructions in system 700. Additionally, other instructions 738 are stored and executed from memory subsystem 730 to provide the logic and the processing of system 700. OS 736 and instructions 738 are executed by processor 720. Memory subsystem 730 includes memory device 732 where it stores data, instructions, programs, or other items. In one embodiment, memory subsystem includes memory controller 734, which is a memory controller to generate and issue commands to memory device 732. It will be understood that memory controller 734 could be a physical part of processor 720.
[0089] Processor 720 and memory subsystem 730 are coupled to bus/bus system 710. Bus 710 is an abstraction that represents any one or more separate physical buses, communication lines/interfaces, and/or point-to-point connections, connected by appropriate bridges, adapters, and/or controllers. Therefore, bus 710 can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (commonly referred to as "Firewire"). The buses of bus 710 can also correspond to interfaces in network interface 750.
[0090] System 700 also includes one or more input/output (I/O) interface(s) 740, network interface 750, one or more internal mass storage device(s) 760, and peripheral interface 770 coupled to bus 710. I/O interface 740 can include one or more interface components through which a user interacts with system 700 (e.g., video, audio, and/or alphanumeric interfacing). Network interface 750 provides system 700 the ability to communicate with remote devices (e.g., servers, other computing devices) over one or more networks. Network interface 750 can include an Ethernet adapter, wireless interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces.
[0091] Storage 760 can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination. Storage 760 holds code or instructions and data 762 in a persistent state (i.e., the value is retained despite interruption of power to system 700). Storage 760 can be generically considered to be a "memory," although memory 730 is the executing or operating memory to provide instructions to processor 720. Whereas storage 760 is nonvolatile, memory 730 can include volatile memory (i.e., the value or state of the data is indeterminate if power is interrupted to system 700).
[0092] Peripheral interface 770 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 700. A dependent connection is one where system 700 provides the software and/or hardware platform on which operation executes, and with which a user interacts.
[0093] In one embodiment, memory 732 is a DRAM. In one embodiment, processor 720 represents one or more processors that execute data stored in one or more DRAM memories 732. In one embodiment, network interface 750 exchanges data with another device in another network location, and the data is data stored in memory 732. In one embodiment, system 700 includes ECS control 780 to manage ECS mode operation for the system. ECS control 780 represents ECS logic to perform ECS operations to check and correct errors internally within memory 732, and count error information in accordance with any embodiment described herein.
[0094] Figure 8 is a block diagram of an embodiment of a mobile device in which an error check and scrub mode with error tracking can be implemented. Device 800 represents a mobile computing device, such as a computing tablet, a mobile phone or smartphone, a wireless-enabled e-reader, wearable computing device, or other mobile device. It will be understood that certain of the components are shown generally, and not all components of such a device are shown in device 800.
[0095] Device 800 includes processor 810, which performs the primary processing operations of device 800. Processor 810 can include one or more physical devices, such as microprocessors, application processors, microcontrollers, programmable logic devices, or other processing means. The processing operations performed by processor 810 include the execution of an operating platform or operating system on which applications and/or device functions are executed. The processing operations include operations related to I/O
(input/output) with a human user or with other devices, operations related to power management, and/or operations related to connecting device 800 to another device. The processing operations can also include operations related to audio I/O and/or display I/O.
[0096] In one embodiment, device 800 includes audio subsystem 820, which represents hardware (e.g., audio hardware and audio circuits) and software (e.g., drivers, codecs) components associated with providing audio functions to the computing device. Audio functions can include speaker and/or headphone output, as well as microphone input. Devices for such functions can be integrated into device 800, or connected to device 800. In one embodiment, a user interacts with device 800 by providing audio commands that are received and processed by processor 810.
[0097] Display subsystem 830 represents hardware (e.g., display devices) and software (e.g., drivers) components that provide a visual and/or tactile display for a user to interact with the computing device. Display subsystem 830 includes display interface 832, which includes the particular screen or hardware device used to provide a display to a user. In one embodiment, display interface 832 includes logic separate from processor 810 to perform at least some processing related to the display. In one embodiment, display subsystem 830 includes a touchscreen device that provides both output and input to a user. In one embodiment, display subsystem 830 includes a high definition (HD) display that provides an output to a user. High definition can refer to a display having a pixel density of
approximately 100 PPI (pixels per inch) or greater, and can include formats such as full H D (e.g., 1080p), retina displays, 4K (ultra high definition or UHD), or others.
[0098] I/O controller 840 represents hardware devices and software components related to interaction with a user. I/O controller 840 can operate to manage hardware that is part of audio subsystem 820 and/or display subsystem 830. Additionally, I/O controller 840 illustrates a connection point for additional devices that connect to device 800 through which a user might interact with the system. For example, devices that can be attached to device 800 might include microphone devices, speaker or stereo systems, video systems or other display device, keyboard or keypad devices, or other I/O devices for use with specific applications such as card readers or other devices.
[0099] As mentioned above, I/O controller 840 can interact with audio subsystem 820 and/or display subsystem 830. For example, input through a microphone or other audio device can provide input or commands for one or more applications or functions of device 800. Additionally, audio output can be provided instead of or in addition to display output. In another example, if display subsystem includes a touchscreen, the display device also acts as an input device, which can be at least partially managed by I/O controller 840. There can also be additional buttons or switches on device 800 to provide I/O functions managed by I/O controller 840.
[00100] In one embodiment, I/O controller 840 manages devices such as accelerometers, cameras, light sensors or other environmental sensors, gyroscopes, global positioning system (GPS), or other hardware that can be included in device 800. The input can be part of direct user interaction, as well as providing environmental input to the system to influence its operations (such as filtering for noise, adjusting displays for brightness detection, applying a flash for a camera, or other features). In one embodiment, device 800 includes power management 850 that manages battery power usage, charging of the battery, and features related to power saving operation.
[00101] Memory subsystem 860 includes memory device(s) 862 for storing information in device 800. Memory subsystem 860 can include nonvolatile (state does not change if power to the memory device is interrupted) and/or volatile (state is indeterminate if power to the memory device is interrupted) memory devices. Memory 860 can store application data, user data, music, photos, documents, or other data, as well as system data (whether long- term or temporary) related to the execution of the applications and functions of system 800. In one embodiment, memory subsystem 860 includes memory controller 864 (which could also be considered part of the control of system 800, and could potentially be considered part of processor 810). Memory controller 864 includes a scheduler to generate and issue commands to memory device 862.
[00102] Connectivity 870 includes hardware devices (e.g., wireless and/or wired connectors and communication hardware) and software components (e.g., drivers, protocol stacks) to enable device 800 to communicate with external devices. The external device could be separate devices, such as other computing devices, wireless access points or base stations, as well as peripherals such as headsets, printers, or other devices.
[00103] Connectivity 870 can include multiple different types of connectivity. To generalize, device 800 is illustrated with cellular connectivity 872 and wireless connectivity 874. Cellular connectivity 872 refers generally to cellular network connectivity provided by wireless carriers, such as provided via GSM (global system for mobile communications) or variations or derivatives, CDMA (code division multiple access) or variations or derivatives, TDM (time division multiplexing) or variations or derivatives, LTE (long term evolution - also referred to as "4G"), or other cellular service standards. Wireless connectivity 874 refers to wireless connectivity that is not cellular, and can include personal area networks (such as Bluetooth), local area networks (such as WiFi), and/or wide area networks (such as WiMax), or other wireless communication. Wireless communication refers to transfer of data through the use of modulated electromagnetic radiation through a non-solid medium. Wired communication occurs through a solid communication medium.
[00104] Peripheral connections 880 include hardware interfaces and connectors, as well as software components (e.g., drivers, protocol stacks) to make peripheral connections. It will be understood that device 800 could both be a peripheral device ("to" 882) to other computing devices, as well as have peripheral devices ("from" 884) connected to it. Device 800 commonly has a "docking" connector to connect to other computing devices for purposes such as managing (e.g., downloading and/or uploading, changing, synchronizing) content on device 800. Additionally, a docking connector can allow device 800 to connect to certain peripherals that allow device 800 to control content output, for example, to audiovisual or other systems.
[00105] In addition to a proprietary docking connector or other proprietary connection hardware, device 800 can make peripheral connections 880 via common or standards-based connectors. Common types can include a Universal Serial Bus (USB) connector (which can include any of a number of different hardware interfaces), DisplayPort including
MiniDisplayPort (MDP), High Definition Multimedia Interface (HDMI), Firewire, or other type.
[00106] In one embodiment, memory 862 is a DRAM. In one embodiment, processor 810 represents one or more processors that execute data stored in one or more DRAM memories 862. In one embodiment, system 800 includes ECS control 890 to manage ECS mode operation for the system. ECS control 890 represents ECS logic to perform ECS operations to check and correct errors internally within memory 862, and count error information in accordance with any embodiment described herein.
[00107] In one aspect, a dynamic random access memory device (DRAM) includes: a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; I/O (input/output) circuitry to couple to an associated memory controller, the I/O circuitry to receive a trigger for an error check and scrub (ECS) mode when coupled to the associated memory controller; and an internal controller on the DRAM, responsive to the trigger for the ECS mode, to read one or more memory locations, perform ECC for the one or more memory locations based on the ECC information, and count error information, the error information including a segment count indicating a number of segments having N or more errors, and a maximum count indicating a maximum number of errors in any segment.
[00108] In one embodiment, the DRAM includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device (SDRAM). In one embodiment, the memory segments comprise DRAM rows. In one embodiment, the trigger comprises an ECS command generated by the memory controller. In one embodiment, the trigger comprises a mode register setting set by the memory controller. In one embodiment, the internal controller is further to generate address information for the memory locations responsive to the trigger for the ECS mode. In one embodiment, memory controller is to identify a bank group associated with the trigger for the ECS mode, and wherein the internal controller is further to generate address information for specific rows within the bank group. In one embodiment, the internal controller is to perform single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode. In one embodiment, N equals 1. In one embodiment, N equals 2. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the DRAM. In one embodiment, further comprising the internal controller to store an address of a segment having the maximum count, wherein the internal controller is to store the address in a multipurpose register (M PR) of a mode register of the DRAM.
[00109] In one aspect, a method for error correction management in a memory subsystem includes: receiving a trigger for an error check and scrub (ECS) mode at a memory device having a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; reading one or more of the memory locations responsive to receiving the trigger for the ECS mode; performing ECC for the one or more memory locations based on the ECC information; and counting error information, the error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.
[00110] In one embodiment, the memory device includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device
(SDRAM). In one embodiment, the memory segments comprise DRAM rows. In one embodiment, receiving the trigger comprises receiving an ECS command generated by the memory controller. In one embodiment, receiving the trigger comprises receiving a mode register setting set by the memory controller. In one embodiment, further comprising generating address information for the memory locations responsive to the trigger for the ECS mode. In one embodiment, further comprising identifying a bank group associated with the trigger for the ECS mode, and generating address information for specific rows within the bank group. In one embodiment, performing ECC comprises performing single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode. In one embodiment, the threshold number equals 1. In one embodiment, the threshold number equals 2. In one embodiment, further comprising storing the segment count in a multipurpose register (M PR) of a mode register of the memory device. In one embodiment, further comprising storing an address of a segment having the maximum count. In one embodiment, further comprising storing the segment count in a multipurpose register (MPR) of a mode register of the memory device; and storing an address of a segment having the maximum count.
[00111] In one aspect, a system with a memory subsystem includes: a memory controller; and multiple double data rate version 4 extended (DDR4E) synchronous dynamic random access memory devices (SDRAMs) including a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data; I/O (input/output) circuitry coupled to the memory controller, the I/O circuitry to receive a trigger for an error check and scrub (ECS) mode from the memory controller; and an internal controller, responsive to the trigger for the ECS mode, to read one or more memory locations, perform ECC for the one or more memory locations based on the ECC information, and count error information, the error information including a segment count indicating a number of segments having N or more errors, and a maximum count indicating a maximum number of errors in any segment.
[00112] In one embodiment, the DRAM includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device (SDRAM). In one embodiment, the memory segments comprise DRAM rows. In one embodiment, the trigger comprises an ECS command generated by the memory controller. In one embodiment, the trigger comprises a mode register setting set by the memory controller. In one embodiment, the internal controller is further to generate address information for the memory locations responsive to the trigger for the ECS mode. In one embodiment, memory controller is to identify a bank group associated with the trigger for the ECS mode, and wherein the internal controller is further to generate address information for specific rows within the bank group. In one embodiment, the internal controller is to perform single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode. In one embodiment, N equals 1. In one embodiment, N equals 2. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the DRAM. In one embodiment, further comprising the internal controller to store the segment count in a multipurpose register (MPR) of a mode register of the SDRAM, and to store an address of a segment having the maximum count. In one embodiment, further comprising one or more of: at least one processor communicatively coupled to the memory controller; a display communicatively coupled to at least one processor; or a network interface communicatively coupled to at least one processor.
[00113] Flow diagrams as illustrated herein provide examples of sequences of various process actions. The flow diagrams can indicate operations to be executed by a software or firmware routine, as well as physical operations. In one embodiment, a flow diagram can illustrate the state of a finite state machine (FSM), which can be implemented in hardware and/or software. Although shown in a particular sequence or order, unless otherwise specified, the order of the actions can be modified. Thus, the illustrated embodiments should be understood only as an example, and the process can be performed in a different order, and some actions can be performed in parallel. Additionally, one or more actions can be omitted in various embodiments; thus, not all actions are required in every embodiment. Other process flows are possible.
[00114] To the extent various operations or functions are described herein, they can be described or defined as software code, instructions, configuration, and/or data. The content can be directly executable ("object" or "executable" form), source code, or difference code ("delta" or "patch" code). The software content of the embodiments described herein can be provided via an article of manufacture with the content stored thereon, or via a method of operating a communication interface to send data via the communication interface. A machine readable storage medium can cause a machine to perform the functions or operations described, and includes any mechanism that stores information in a form accessible by a machine (e.g., computing device, electronic system, etc.), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.). A communication interface includes any mechanism that interfaces to any of a hardwired, wireless, optical, etc., medium to communicate to another device, such as a memory bus interface, a processor bus interface, an Internet connection, a disk controller, etc. The communication interface can be configured by providing configuration parameters and/or sending signals to prepare the communication interface to provide a data signal describing the software content. The communication interface can be accessed via one or more commands or signals sent to the communication interface.
[00115] Various components described herein can be a means for performing the operations or functions described. Each component described herein includes software, hardware, or a combination of these. The components can be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), etc.), embedded controllers, hardwired circuitry, etc.
[00116] Besides what is described herein, various modifications can be made to the disclosed embodiments and implementations of the invention without departing from their scope. Therefore, the illustrations and examples herein should be construed in an illustrative, and not a restrictive sense. The scope of the invention should be measured solely by reference to the claims that follow.

Claims

CLAIMS What is claimed is:
1. A dynamic random access memory device (DRAM), comprising:
a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data;
I/O (input/output) circuitry to couple to an associated memory controller, the I/O circuitry to receive a trigger for an error check and scrub (ECS) mode when coupled to the associated memory controller; and
an internal controller on the DRAM, responsive to the trigger for the ECS mode, to read one or more memory locations, perform ECC for the one or more memory locations based on the ECC information, and count error information, the error information including a segment count indicating a number of segments having N or more errors, and a maximum count indicating a maximum number of errors in any segment.
2. The DRAM of claim 1, wherein the DRAM includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device
(SDRAM).
3. The DRAM of any of claims 1 to 2, wherein the memory segments comprise DRAM rows.
4. The DRAM of any of claims 1 to 3, wherein the trigger comprises an ECS command generated by the memory controller.
5. The DRAM of any of claims 1 to 3, wherein the trigger comprises a mode register setting set by the memory controller.
6. The DRAM of any of claims 1 to 5, wherein the internal controller is further to generate address information for the memory locations responsive to the trigger for the ECS mode.
7. The DRAM of any of claims 1 to 6, wherein memory controller is to identify a bank group associated with the trigger for the ECS mode, and wherein the internal controller is further to generate address information for specific rows within the bank group.
8. The DRAM of any of claims 1 to 7, wherein the internal controller is to perform single bit error (SBE) ECC for the one or more memory locations in response to the trigger for ECS mode.
9. The DRAM of any of claims 1 to 8, wherein N equals 1.
10. The DRAM of any of claims 1 to 8, wherein N equals 2.
11. The DRAM of any of claims 1 to 107 further comprising the internal controller to store the segment count in a multipurpose register (M PR) of a mode register of the DRAM.
12. The DRAM of any of claims 1 to 11, further comprising the internal controller to store an address of a segment having the maximum count, wherein the internal controller is to store the address in a multipurpose register (MPR) of a mode register of the DRAM.
13. A method for error correction management in a memory subsystem, comprising: receiving a trigger for an error check and scrub (ECS) mode at a memory device having a storage array including multiple memory segments, the memory segments including multiple memory locations to store data and error checking and correction (ECC) information associated with the data;
reading one or more of the memory locations responsive to receiving the trigger for the ECS mode;
performing ECC for the one or more memory locations based on the ECC
information; and
counting error information, the error information including a segment count indicating a number of segments having at least a threshold number of errors, and a maximum count indicating a maximum number of errors in any segment.
14. The method of claim 13, wherein the memory device includes a double data rate version 4 extended (DDR4E) compliant synchronous dynamic random access memory device (SDRAM).
15. The method of any of claims 13 to 14, wherein the memory segments comprise DRAM rows.
16. The method of any of claims 13 to 15, wherein receiving the trigger comprises receiving one or more of an ECS command generated by the memory controller, or a mode register setting set by the memory controller.
17. The method of any of claims 13 to 16, further comprising generating address information for the memory locations responsive to the trigger for the ECS mode.
18. The method of any of claims 13 to 17, wherein the threshold number equals 1 or 2.
19. The method of any of claims 13 to 18, further comprising storing an address of a segment having the maximum count.
20. The method of claim 19, further comprising storing the segment count in a multipurpose register (M PR) of a mode register of the memory device, or storing the address of the segment having the maximum count in an MPR, or storing both the segment count and the address of the segment having the maximum count in an M PR.
21. A system with a memory subsystem, comprising:
a memory controller; and
multiple synchronous dynamic random access memory devices (SDRAMs) in accordance with any embodiment of claims 1 to 12.
22. The system of claim 21, further comprising one or more of:
at least one processor communicatively coupled to the memory controller; a display communicatively coupled to at least one processor; or a network interface communicatively coupled to at least one processor.
23. An article of manufacture comprising a computer readable storage medium having content stored thereon to cause execution of operations to execute a method for error correction management in accordance with any of claims 13 to 20.
24. An apparatus for error correction management, comprising means for performing operations to execute a method in accordance with any of claims 13 to 20.
PCT/US2016/045640 2015-08-28 2016-08-04 Memory device error check and scrub mode and error transparency WO2017039948A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21193306.4A EP3992973A1 (en) 2015-08-28 2016-08-04 Memory device error check and scrub mode providing error transparency
EP16842532.0A EP3341941B1 (en) 2015-08-28 2016-08-04 Memory device error check and scrub mode and error transparency
CN201680050250.3A CN107924705B (en) 2015-08-28 2016-08-04 Dynamic random access memory device, memory controller and memory system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201562211448P 2015-08-28 2015-08-28
US62/211,448 2015-08-28
US14/998,184 2015-12-26
US14/998,184 US10127101B2 (en) 2015-08-28 2015-12-26 Memory device error check and scrub mode and error transparency

Publications (1)

Publication Number Publication Date
WO2017039948A1 true WO2017039948A1 (en) 2017-03-09

Family

ID=58095521

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/US2016/045639 WO2017039947A1 (en) 2015-08-28 2016-08-04 Memory device check bit read mode
PCT/US2016/045640 WO2017039948A1 (en) 2015-08-28 2016-08-04 Memory device error check and scrub mode and error transparency
PCT/US2016/045643 WO2017039949A1 (en) 2015-08-28 2016-08-04 Memory device on-die error checking and correcting code

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US2016/045639 WO2017039947A1 (en) 2015-08-28 2016-08-04 Memory device check bit read mode

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2016/045643 WO2017039949A1 (en) 2015-08-28 2016-08-04 Memory device on-die error checking and correcting code

Country Status (4)

Country Link
US (4) US10127101B2 (en)
EP (4) EP3341941B1 (en)
CN (4) CN107924705B (en)
WO (3) WO2017039947A1 (en)

Families Citing this family (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445229B1 (en) 2013-01-28 2019-10-15 Radian Memory Systems, Inc. Memory controller with at least one address segment defined for which data is striped across flash memory dies, with a common address offset being used to obtain physical addresses for the data in each of the dies
US11249652B1 (en) 2013-01-28 2022-02-15 Radian Memory Systems, Inc. Maintenance of nonvolatile memory on host selected namespaces by a common memory controller
US9823966B1 (en) 2013-11-11 2017-11-21 Rambus Inc. Memory component with error-detect-correct code interface
US9542118B1 (en) 2014-09-09 2017-01-10 Radian Memory Systems, Inc. Expositive flash memory control
US9929750B2 (en) * 2015-09-08 2018-03-27 Toshiba Memory Corporation Memory system
US10162702B2 (en) * 2016-02-01 2018-12-25 Lattice Semiconductor Corporation Segmented error coding for block-based memory
US10628248B2 (en) * 2016-03-15 2020-04-21 International Business Machines Corporation Autonomous dram scrub and error counting
CN109074851B (en) * 2016-05-02 2023-09-22 英特尔公司 Internal Error Checksum Correction (ECC) utilizing additional system bits
KR102504176B1 (en) * 2016-06-23 2023-03-02 에스케이하이닉스 주식회사 Semiconductor device
US10120749B2 (en) * 2016-09-30 2018-11-06 Intel Corporation Extended application of error checking and correction code in memory
KR20180038109A (en) * 2016-10-05 2018-04-16 삼성전자주식회사 Electronic device including monitoring circuit and storage device included therein
KR20180069179A (en) * 2016-12-14 2018-06-25 에스케이하이닉스 주식회사 Memory system and error correcting method of the same
KR20180090124A (en) * 2017-02-02 2018-08-10 에스케이하이닉스 주식회사 Memory system and operating method of memory system
US10275306B1 (en) * 2017-02-09 2019-04-30 Cadence Design Systems, Inc. System and method for memory control having adaptively split addressing of error-protected data words in memory transactions for inline storage configurations
US10303543B1 (en) * 2017-02-09 2019-05-28 Cadence Design Systems, Inc. System and method for memory control having address integrity protection for error-protected data words of memory transactions
EP3370152B1 (en) 2017-03-02 2019-12-25 INTEL Corporation Integrated error checking and correction (ecc) in memory devices with fixed bandwidth interfaces
JP2018160166A (en) * 2017-03-23 2018-10-11 東芝メモリ株式会社 Memory system, and resistance change type memory
US10853163B2 (en) * 2017-04-28 2020-12-01 Qualcomm Incorporated Optimized error-correcting code (ECC) for data protection
KR102258140B1 (en) 2017-07-06 2021-05-28 삼성전자주식회사 Error correction circuit of semiconductor memory device, semiconductor memory device and memory system
US10346244B2 (en) * 2017-08-10 2019-07-09 Micron Technology, Inc. Shared address counters for multiple modes of operation in a memory device
KR20190033410A (en) * 2017-09-21 2019-03-29 삼성전자주식회사 Apparatus supporting error correction code and method for testing the same
DE102018122826A1 (en) * 2017-09-21 2019-03-21 Samsung Electronics Co., Ltd. Apparatus for supporting an error correction code and test method therefor
US10908995B2 (en) 2017-09-29 2021-02-02 Nvidia Corporation Securing against errors in an error correcting code (ECC) implemented in an automotive system
JP6861611B2 (en) * 2017-11-07 2021-04-21 ルネサスエレクトロニクス株式会社 Semiconductor devices and semiconductor systems equipped with them
KR20190054533A (en) 2017-11-14 2019-05-22 삼성전자주식회사 Semiconductor memory devices, memory systems including the same and methods of operating semiconductor memory devices
US10691533B2 (en) * 2017-12-12 2020-06-23 Micron Technology, Inc. Error correction code scrub scheme
CN108334700B (en) * 2018-02-05 2021-04-02 武汉科技大学 Equivalent circuit of fractional order memory container
US10853168B2 (en) * 2018-03-28 2020-12-01 Samsung Electronics Co., Ltd. Apparatus to insert error-correcting coding (ECC) information as data within dynamic random access memory (DRAM)
US10733046B2 (en) * 2018-04-20 2020-08-04 Micron Technology, Inc. Transaction metadata
KR102553780B1 (en) * 2018-05-10 2023-07-10 에스케이하이닉스 주식회사 Memory device, memory system including the same and operation method of the memory system
CN111771192A (en) 2018-05-11 2020-10-13 拉姆伯斯公司 Efficient storage of error correction code information
US11379155B2 (en) 2018-05-24 2022-07-05 Alibaba Group Holding Limited System and method for flash storage management using multiple open page stripes
WO2020000136A1 (en) 2018-06-25 2020-01-02 Alibaba Group Holding Limited System and method for managing resources of a storage device and quantifying the cost of i/o requests
US10725941B2 (en) * 2018-06-30 2020-07-28 Western Digital Technologies, Inc. Multi-device storage system with hosted services on peer storage devices
US11232196B2 (en) * 2018-07-09 2022-01-25 Arm Limited Tracking events of interest to mitigate attacks
US11361111B2 (en) 2018-07-09 2022-06-14 Arm Limited Repetitive side channel attack countermeasures
US11074126B2 (en) 2018-07-12 2021-07-27 Micron Technology, Inc. Methods for error count reporting with scaled error count information, and memory devices employing the same
US11221910B2 (en) 2018-07-24 2022-01-11 Micron Technology, Inc. Media scrubber in memory system
US10685736B2 (en) * 2018-07-24 2020-06-16 Dell Products, L.P. Maintaining highest performance of DDR5 channel with marginal signal integrity
US11327929B2 (en) 2018-09-17 2022-05-10 Alibaba Group Holding Limited Method and system for reduced data movement compression using in-storage computing and a customized file system
DE102018126685B3 (en) * 2018-10-25 2019-10-10 Infineon Technologies Ag Processing of data
US10636476B2 (en) 2018-11-01 2020-04-28 Intel Corporation Row hammer mitigation with randomization of target row selection
US11427010B2 (en) * 2018-12-03 2022-08-30 Hewlett-Packard Development Company, L.P. Logic circuitry
KR20200076846A (en) * 2018-12-20 2020-06-30 에스케이하이닉스 주식회사 Apparatuses that detect error of data stored in memory device and operating method thereof
US11061735B2 (en) 2019-01-02 2021-07-13 Alibaba Group Holding Limited System and method for offloading computation to storage nodes in distributed system
CN109857616B (en) * 2019-01-25 2021-05-18 山东华芯半导体有限公司 DRAM controller bandwidth efficiency detection method based on instruction
US11016781B2 (en) * 2019-04-26 2021-05-25 Samsung Electronics Co., Ltd. Methods and memory modules for enabling vendor specific functionalities
US11409544B2 (en) * 2019-05-07 2022-08-09 Microsoft Technology Licensing, Llc Dynamically-configurable baseboard management controller
US11182234B2 (en) 2019-05-10 2021-11-23 Arm Limited Tracking events of interest
EP3748637A1 (en) * 2019-06-07 2020-12-09 IHP GmbH - Innovations for High Performance Microelectronics / Leibniz-Institut für innovative Mikroelektronik Electronic circuit with integrated seu monitor
KR20200142213A (en) 2019-06-12 2020-12-22 삼성전자주식회사 Error correction circuit of semiconductor memory device, semiconductor memory device and memory system
KR20200144724A (en) 2019-06-19 2020-12-30 삼성전자주식회사 Semiconductor memory devices and memory systems including the same
US11237903B2 (en) * 2019-06-25 2022-02-01 Intel Corporation Technologies for providing ECC pre-provisioning and handling for cross-point memory and compute operations
US10860223B1 (en) 2019-07-18 2020-12-08 Alibaba Group Holding Limited Method and system for enhancing a distributed storage system by decoupling computation and network tasks
CN110444243A (en) * 2019-07-31 2019-11-12 至誉科技(武汉)有限公司 Store test method, system and the storage medium of equipment read error error correcting capability
US11403172B2 (en) * 2019-08-05 2022-08-02 Cypress Semiconductor Corporation Methods for error detection and correction and corresponding systems and devices for the same
EP3772840B1 (en) * 2019-08-06 2023-03-15 Nxp B.V. A security module for a can node
US11537432B2 (en) 2019-08-15 2022-12-27 Cisco Technology, Inc. Dynamic data-plane resource shadowing
US11599424B2 (en) 2019-08-15 2023-03-07 Cisco Technology, Inc. Dynamic hardware resource shadowing and memory error protection
KR20210026201A (en) * 2019-08-29 2021-03-10 삼성전자주식회사 Semiconductor memory devices, memory systems including the same and methods of controlling repair of the same
CN110718263B (en) * 2019-09-09 2021-08-10 无锡江南计算技术研究所 Efficient sectional test system and method for chip access path
KR20210034726A (en) 2019-09-20 2021-03-31 삼성전자주식회사 Memory module, error correcting method in memory controllor for controlling the same and computing system having the same
US11617282B2 (en) 2019-10-01 2023-03-28 Alibaba Group Holding Limited System and method for reshaping power budget of cabinet to facilitate improved deployment density of servers
US11145351B2 (en) * 2019-11-07 2021-10-12 SK Hynix Inc. Semiconductor devices
US11354189B2 (en) 2019-11-07 2022-06-07 SK Hynix Inc. Semiconductor devices and semiconductor systems including the same
KR20210055865A (en) 2019-11-07 2021-05-18 에스케이하이닉스 주식회사 Semiconductor device and semiconductor system
US11249843B2 (en) 2019-11-07 2022-02-15 SK Hynix Inc. Semiconductor devices and semiconductor systems including the same
US11175984B1 (en) * 2019-12-09 2021-11-16 Radian Memory Systems, Inc. Erasure coding techniques for flash memory
US11611358B2 (en) * 2019-12-24 2023-03-21 Kioxia Corporation Systems and methods for detecting or preventing false detection of three error bits by SEC
KR20210085284A (en) 2019-12-30 2021-07-08 삼성전자주식회사 PIM memory device, computing system including PIM memory device and method for operating PIM memory device
US11449455B2 (en) 2020-01-15 2022-09-20 Alibaba Group Holding Limited Method and system for facilitating a high-capacity object storage system with configuration agility and mixed deployment flexibility
US11379447B2 (en) 2020-02-06 2022-07-05 Alibaba Group Holding Limited Method and system for enhancing IOPS of a hard disk drive system based on storing metadata in host volatile memory and data in non-volatile memory using a shared controller
US11436082B2 (en) * 2020-02-11 2022-09-06 Micron Technology, Inc. Internal error correction for memory devices
TWI708248B (en) * 2020-02-11 2020-10-21 華邦電子股份有限公司 Memory device and method of adjusting parameter used of memory device
US11221800B2 (en) * 2020-03-02 2022-01-11 Micron Technology, Inc. Adaptive and/or iterative operations in executing a read command to retrieve data from memory cells
US11221913B2 (en) * 2020-03-11 2022-01-11 Micron Technology, Inc. Error check and scrub for semiconductor memory device
US11200114B2 (en) * 2020-03-17 2021-12-14 Alibaba Group Holding Limited System and method for facilitating elastic error correction code in memory
US11449386B2 (en) 2020-03-20 2022-09-20 Alibaba Group Holding Limited Method and system for optimizing persistent memory on data retention, endurance, and performance for host memory
US11410713B2 (en) * 2020-04-06 2022-08-09 Micron Technology, Inc. Apparatuses and methods for detecting illegal commands and command sequences
JP2021168431A (en) * 2020-04-09 2021-10-21 ミネベアミツミ株式会社 Checksum addition method and checksum addition device
US11301173B2 (en) 2020-04-20 2022-04-12 Alibaba Group Holding Limited Method and system for facilitating evaluation of data access frequency and allocation of storage device resources
US11385833B2 (en) 2020-04-20 2022-07-12 Alibaba Group Holding Limited Method and system for facilitating a light-weight garbage collection with a reduced utilization of resources
US11281575B2 (en) 2020-05-11 2022-03-22 Alibaba Group Holding Limited Method and system for facilitating data placement and control of physical addresses with multi-queue I/O blocks
US11494115B2 (en) 2020-05-13 2022-11-08 Alibaba Group Holding Limited System method for facilitating memory media as file storage device based on real-time hashing by performing integrity check with a cyclical redundancy check (CRC)
US11461262B2 (en) 2020-05-13 2022-10-04 Alibaba Group Holding Limited Method and system for facilitating a converged computation and storage node in a distributed storage system
US11314589B2 (en) 2020-05-15 2022-04-26 Intel Corporation Read retry to selectively disable on-die ECC
US11556277B2 (en) 2020-05-19 2023-01-17 Alibaba Group Holding Limited System and method for facilitating improved performance in ordering key-value storage with input/output stack simplification
US11507499B2 (en) 2020-05-19 2022-11-22 Alibaba Group Holding Limited System and method for facilitating mitigation of read/write amplification in data compression
KR20210147132A (en) 2020-05-27 2021-12-07 삼성전자주식회사 Memory device and memory module comprising memory device
US11170869B1 (en) 2020-06-04 2021-11-09 Western Digital Technologies, Inc. Dual data protection in storage devices
US11322218B2 (en) * 2020-06-08 2022-05-03 Micron Technology, Inc. Error control for memory device
US11263132B2 (en) 2020-06-11 2022-03-01 Alibaba Group Holding Limited Method and system for facilitating log-structure data organization
KR20210154277A (en) * 2020-06-11 2021-12-21 삼성전자주식회사 Memory module and operating method thereof
US11422931B2 (en) 2020-06-17 2022-08-23 Alibaba Group Holding Limited Method and system for facilitating a physically isolated storage unit for multi-tenancy virtualization
US11354200B2 (en) 2020-06-17 2022-06-07 Alibaba Group Holding Limited Method and system for facilitating data recovery and version rollback in a storage device
US11601137B2 (en) 2020-06-18 2023-03-07 Intel Corporation ECC memory chip encoder and decoder
US11438015B2 (en) * 2020-07-10 2022-09-06 Taiwan Semiconductor Manufacturing Company, Ltd. Two-level error correcting code with sharing of check-bits
US11354233B2 (en) 2020-07-27 2022-06-07 Alibaba Group Holding Limited Method and system for facilitating fast crash recovery in a storage device
US11372774B2 (en) 2020-08-24 2022-06-28 Alibaba Group Holding Limited Method and system for a solid state drive with on-chip memory integration
WO2022066178A1 (en) * 2020-09-26 2022-03-31 Intel Corporation Adaptive internal memory error scrubbing and error handling
KR20220044015A (en) * 2020-09-29 2022-04-06 삼성전자주식회사 Controller for preventing uncorrectable memory device, memory device having the same, and operating method thereof
US11803180B2 (en) 2020-10-15 2023-10-31 Ethernovia Inc. Determining diagnostic coverage for achieving functional safety
US11734966B1 (en) 2020-10-15 2023-08-22 Ethernovia Inc. Recursive system layer analysis for achieving functional safety
CN112349342B (en) * 2020-11-05 2024-03-22 海光信息技术股份有限公司 Maintenance device, method, equipment and storage medium for maintaining DDR5 memory subsystem
KR20220081644A (en) 2020-12-09 2022-06-16 삼성전자주식회사 Memory device and memory system including the same
US11487465B2 (en) 2020-12-11 2022-11-01 Alibaba Group Holding Limited Method and system for a local storage engine collaborating with a solid state drive controller
KR20220090794A (en) * 2020-12-23 2022-06-30 삼성전자주식회사 Memory device, controller for controlling the same, memory system having the same, and operating method thereof
US11734115B2 (en) 2020-12-28 2023-08-22 Alibaba Group Holding Limited Method and system for facilitating write latency reduction in a queue depth of one scenario
KR20220094489A (en) 2020-12-29 2022-07-06 삼성전자주식회사 Semiconductor memory devices and methods of operating the same
US11416365B2 (en) 2020-12-30 2022-08-16 Alibaba Group Holding Limited Method and system for open NAND block detection and correction in an open-channel SSD
US11409601B1 (en) * 2021-01-26 2022-08-09 Micron Technology, Inc. Memory device protection
US11573854B2 (en) * 2021-02-02 2023-02-07 Nvidia Corporation Techniques for data scrambling on a memory interface
US11726699B2 (en) 2021-03-30 2023-08-15 Alibaba Singapore Holding Private Limited Method and system for facilitating multi-stream sequential read performance improvement with reduced read amplification
US11461173B1 (en) 2021-04-21 2022-10-04 Alibaba Singapore Holding Private Limited Method and system for facilitating efficient data compression based on error correction code and reorganization of data placement
US11625295B2 (en) * 2021-05-10 2023-04-11 Micron Technology, Inc. Operating memory device in performance mode
US11476874B1 (en) 2021-05-14 2022-10-18 Alibaba Singapore Holding Private Limited Method and system for facilitating a storage server with hybrid memory for journaling and data storage
US20220405601A1 (en) * 2021-06-16 2022-12-22 Western Digital Technologies, Inc. Enhanced digital signal processor (dsp) nand flash
US11579971B2 (en) * 2021-07-14 2023-02-14 Micron Technology, Inc. Apparatuses, systems, and methods for forced error check and scrub readouts
US11664084B2 (en) * 2021-08-02 2023-05-30 Micron Technology, Inc. Memory device on-die ECC data
KR20230072336A (en) * 2021-11-17 2023-05-24 에스케이하이닉스 주식회사 Semiconductor device
KR20230073915A (en) * 2021-11-19 2023-05-26 에스케이하이닉스 주식회사 Error check and scrub operation method and semiconductor system using the same
WO2023099933A1 (en) * 2021-12-02 2023-06-08 Micron Technology, Inc. Memory device with data scrubbing capability and methods
US11797215B2 (en) * 2021-12-09 2023-10-24 SK Hynix Inc. Memory device and memory system performing error check and scrub operation
US20230214119A1 (en) * 2021-12-30 2023-07-06 Micron Technology, Inc. Data stripe protection
US20230231574A1 (en) * 2022-01-20 2023-07-20 Micron Technology, Inc. Syndrome check functionality to differentiate between error types
US11841765B2 (en) * 2022-03-31 2023-12-12 Micron Technology, Inc. Scrub operations with row error information
CN115190069B (en) * 2022-04-26 2023-12-05 中国人民解放军国防科技大学 High-performance network-on-chip fault-tolerant router device
US20230350748A1 (en) * 2022-04-27 2023-11-02 Micron Technology, Inc. Apparatuses, systems, and methods for per row error scrub information
US20230352112A1 (en) * 2022-04-27 2023-11-02 Micron Technology, Inc. Apparatuses, systems, and methods for per row error scrub information registers
CN117238356A (en) * 2022-06-08 2023-12-15 成都华为技术有限公司 Memory module and electronic equipment
US20230409426A1 (en) * 2022-06-16 2023-12-21 Advanced Micro Devices, Inc. Host-level error detection and fault correction
CN117636997A (en) * 2022-08-17 2024-03-01 长鑫存储技术有限公司 Counting circuit and memory
CN115295040B (en) * 2022-10-08 2023-06-02 睿力集成电路有限公司 Control circuit, control method and semiconductor memory

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070234182A1 (en) * 2006-03-31 2007-10-04 Wickeraad John A Error checking and correction (ECC) system and method
US20090249169A1 (en) 2008-03-28 2009-10-01 Bains Kuljit S Systems, methods, and apparatuses to save memory self-refresh power
US7650558B2 (en) * 2005-08-16 2010-01-19 Intel Corporation Systems, methods, and apparatuses for using the same memory type for both error check and non-error check memory systems
US20130132797A1 (en) * 2008-11-17 2013-05-23 Elpida Memory, Inc. Control method for a semiconductor memory device
US20140211579A1 (en) * 2013-01-30 2014-07-31 John V. Lovelace Apparatus, method and system to determine memory access command timing based on error detection

Family Cites Families (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58105500A (en) * 1981-11-23 1983-06-23 スペリ・コ−ポレ−シヨン Trouble detection system and method for memory driving circuit
US5307356A (en) * 1990-04-16 1994-04-26 International Business Machines Corporation Interlocked on-chip ECC system
US5633767A (en) 1995-06-06 1997-05-27 International Business Machines Corporation Adaptive and in-situ load/unload damage estimation and compensation
KR100383404B1 (en) 1996-08-16 2003-07-16 동경 엘렉트론 디바이스 주식회사 Semiconductor memory device having error detection and correction
US6052818A (en) * 1998-02-27 2000-04-18 International Business Machines Corporation Method and apparatus for ECC bus protection in a computer system with non-parity memory
US6487685B1 (en) 1999-09-30 2002-11-26 Silicon Graphics, Inc. System and method for minimizing error correction code bits in variable sized data formats
US6256757B1 (en) 2000-01-24 2001-07-03 Credence Systems Corporation Apparatus for testing memories with redundant storage elements
DE10017543A1 (en) 2000-04-08 2001-10-11 Bosch Gmbh Robert Procedures for error detection and error healing
US6785856B1 (en) 2000-12-07 2004-08-31 Advanced Micro Devices, Inc. Internal self-test circuit for a memory array
US6957378B2 (en) 2001-06-04 2005-10-18 Kabushiki Kaisha Toshiba Semiconductor memory device
US7003698B2 (en) 2002-06-29 2006-02-21 Intel Corporation Method and apparatus for transport of debug events between computer system components
US7096407B2 (en) 2003-02-18 2006-08-22 Hewlett-Packard Development Company, L.P. Technique for implementing chipkill in a memory system
US7234099B2 (en) 2003-04-14 2007-06-19 International Business Machines Corporation High reliability memory module with a fault tolerant address and command bus
US6853602B2 (en) * 2003-05-09 2005-02-08 Taiwan Semiconductor Manufacturing Company, Ltd. Hiding error detecting/correcting latency in dynamic random access memory (DRAM)
JP3889391B2 (en) * 2003-11-06 2007-03-07 ローム株式会社 Memory device and display device
US7370260B2 (en) * 2003-12-16 2008-05-06 Freescale Semiconductor, Inc. MRAM having error correction code circuitry and method therefor
US7376887B2 (en) 2003-12-22 2008-05-20 International Business Machines Corporation Method for fast ECC memory testing by software including ECC check byte
JP4401319B2 (en) * 2005-04-07 2010-01-20 株式会社日立製作所 DRAM stacked package and DRAM stacked package test and relief method
US7227797B2 (en) 2005-08-30 2007-06-05 Hewlett-Packard Development Company, L.P. Hierarchical memory correction system and method
US7739576B2 (en) * 2006-08-31 2010-06-15 Micron Technology, Inc. Variable strength ECC
US20090132876A1 (en) 2007-11-19 2009-05-21 Ronald Ernest Freking Maintaining Error Statistics Concurrently Across Multiple Memory Ranks
KR101001446B1 (en) 2008-12-24 2010-12-14 주식회사 하이닉스반도체 Nonvolatile Memory Device and Operating Method thereof
KR20100102925A (en) 2009-03-12 2010-09-27 삼성전자주식회사 Non-volatile memory device and memory system generating read reclaim signal
US8230305B2 (en) 2009-04-02 2012-07-24 Micron Technology, Inc. Extended single-bit error correction and multiple-bit error detection
US9170879B2 (en) 2009-06-24 2015-10-27 Headway Technologies, Inc. Method and apparatus for scrubbing accumulated data errors from a memory system
US8495467B1 (en) 2009-06-30 2013-07-23 Micron Technology, Inc. Switchable on-die memory error correcting engine
EP2483779B1 (en) * 2009-09-28 2015-11-11 Nvidia Corporation Error detection and correction for external dram
US8862973B2 (en) * 2009-12-09 2014-10-14 Intel Corporation Method and system for error management in a memory device
US8464125B2 (en) 2009-12-10 2013-06-11 Intel Corporation Instruction-set architecture for programmable cyclic redundancy check (CRC) computations
US8230255B2 (en) 2009-12-15 2012-07-24 International Business Machines Corporation Blocking write acces to memory modules of a solid state drive
US8438344B2 (en) * 2010-03-12 2013-05-07 Texas Instruments Incorporated Low overhead and timing improved architecture for performing error checking and correction for memories and buses in system-on-chips, and other circuits, systems and processes
US8640005B2 (en) * 2010-05-21 2014-01-28 Intel Corporation Method and apparatus for using cache memory in a system that supports a low power state
JP2012050008A (en) 2010-08-30 2012-03-08 Toshiba Corp Error detection/correction method and semiconductor memory device
US8914687B2 (en) * 2011-04-15 2014-12-16 Advanced Micro Devices, Inc. Providing test coverage of integrated ECC logic en embedded memory
US20120297256A1 (en) * 2011-05-20 2012-11-22 Qualcomm Incorporated Large Ram Cache
US8645811B2 (en) 2011-10-27 2014-02-04 Dell Products L.P. System and method for selective error checking
US8892986B2 (en) 2012-03-08 2014-11-18 Micron Technology, Inc. Apparatuses and methods for combining error coding and modulation schemes
KR101938210B1 (en) 2012-04-18 2019-01-15 삼성전자주식회사 Operating method of memory systme including nand flash memory, variable resistance memory and controller
EP2677429A1 (en) * 2012-06-18 2013-12-25 Renesas Electronics Europe Limited Error correction
US9037949B1 (en) 2012-06-21 2015-05-19 Rambus Inc. Error correction in a memory device
CN103593252B (en) * 2012-08-14 2017-06-13 旺宏电子股份有限公司 The memory detected with dynamic error and corrected
US9009566B2 (en) 2012-09-12 2015-04-14 Macronix International Co., Ltd. Outputting information of ECC corrected bits
KR102002925B1 (en) * 2012-11-01 2019-07-23 삼성전자주식회사 Memory module, memory system havint the same, and driving method thereof
US9070479B2 (en) 2013-01-21 2015-06-30 Sandisk Technologies Inc. Systems and methods of updating read voltages
JP2014157391A (en) 2013-02-14 2014-08-28 Sony Corp Storage control device, storage device, information processing system, and storage control method
US8996953B2 (en) 2013-03-01 2015-03-31 International Business Machines Corporation Self monitoring and self repairing ECC
US9032264B2 (en) 2013-03-21 2015-05-12 Kabushiki Kaisha Toshiba Test method for nonvolatile memory
CN103218275B (en) * 2013-03-28 2015-11-25 华为技术有限公司 Error in data restorative procedure, device and equipment
KR20150020385A (en) * 2013-08-13 2015-02-26 에스케이하이닉스 주식회사 Data storage device, operating method thereof and data processing system including the same
US20150067437A1 (en) 2013-08-30 2015-03-05 Kuljit S. Bains Apparatus, method and system for reporting dynamic random access memory error information
CN108831512A (en) 2013-10-15 2018-11-16 拉姆伯斯公司 Load reduced memory module
CN103744744B (en) * 2014-02-08 2017-08-25 威盛电子股份有限公司 The data verification method of data memory device and volatile memory
US9299457B2 (en) 2014-02-23 2016-03-29 Qualcomm Incorporated Kernel masking of DRAM defects
US9495232B2 (en) 2014-03-28 2016-11-15 Intel IP Corporation Error correcting (ECC) memory compatibility
US20150331732A1 (en) 2014-05-13 2015-11-19 Rambus Inc. Memory device having storage for an error code correction event count
US9367392B2 (en) 2014-08-01 2016-06-14 Winbond Electronics Corporation NAND flash memory having internal ECC processing and method of operation thereof
US9766972B2 (en) 2014-08-07 2017-09-19 Pure Storage, Inc. Masking defective bits in a storage array
KR102204391B1 (en) 2014-08-18 2021-01-18 삼성전자주식회사 Memory device having sharable ECC (Error Correction Code) cell array
CN104575585A (en) * 2015-01-15 2015-04-29 西安华芯半导体有限公司 DRAM expansion structure and DRAM expansion method
US9606851B2 (en) * 2015-02-02 2017-03-28 International Business Machines Corporation Error monitoring of a memory device containing embedded error correction
US9940457B2 (en) 2015-02-13 2018-04-10 International Business Machines Corporation Detecting a cryogenic attack on a memory device with embedded error correction
US9691505B2 (en) 2015-03-27 2017-06-27 Intel Corporation Dynamic application of error correction code (ECC) based on error type
US9811420B2 (en) 2015-03-27 2017-11-07 Intel Corporation Extracting selective information from on-die dynamic random access memory (DRAM) error correction code (ECC)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7650558B2 (en) * 2005-08-16 2010-01-19 Intel Corporation Systems, methods, and apparatuses for using the same memory type for both error check and non-error check memory systems
US20070234182A1 (en) * 2006-03-31 2007-10-04 Wickeraad John A Error checking and correction (ECC) system and method
US20090249169A1 (en) 2008-03-28 2009-10-01 Bains Kuljit S Systems, methods, and apparatuses to save memory self-refresh power
US20130132797A1 (en) * 2008-11-17 2013-05-23 Elpida Memory, Inc. Control method for a semiconductor memory device
US20140211579A1 (en) * 2013-01-30 2014-07-31 John V. Lovelace Apparatus, method and system to determine memory access command timing based on error detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3341941A4

Also Published As

Publication number Publication date
EP3341941A1 (en) 2018-07-04
US10810079B2 (en) 2020-10-20
US20190073261A1 (en) 2019-03-07
US20170060680A1 (en) 2017-03-02
EP3341939A1 (en) 2018-07-04
WO2017039947A1 (en) 2017-03-09
CN107924698B (en) 2022-06-21
WO2017039949A1 (en) 2017-03-09
EP3341939B1 (en) 2020-07-15
US10127101B2 (en) 2018-11-13
EP3341941B1 (en) 2021-12-22
EP3341844B1 (en) 2020-10-21
EP3341941A4 (en) 2019-05-01
CN107924705B (en) 2022-08-16
CN113808658A (en) 2021-12-17
US20170063394A1 (en) 2017-03-02
CN107924349A (en) 2018-04-17
CN107924349B (en) 2021-11-23
US9842021B2 (en) 2017-12-12
CN107924698A (en) 2018-04-17
EP3992973A1 (en) 2022-05-04
EP3341844A1 (en) 2018-07-04
EP3341939A4 (en) 2019-04-24
EP3341844A4 (en) 2019-05-22
US20170060681A1 (en) 2017-03-02
US9817714B2 (en) 2017-11-14
CN107924705A (en) 2018-04-17

Similar Documents

Publication Publication Date Title
US10810079B2 (en) Memory device error check and scrub mode and error transparency
US10636476B2 (en) Row hammer mitigation with randomization of target row selection
US10496473B2 (en) Extracting selective information from on-die dynamic random access memory (DRAM) error correction code (ECC)
US10572343B2 (en) Targeted aliasing single error correction (SEC) code
US10872011B2 (en) Internal error checking and correction (ECC) with extra system bits
US11314589B2 (en) Read retry to selectively disable on-die ECC
US11704194B2 (en) Memory wordline isolation for improvement in reliability, availability, and scalability (RAS)
WO2022066178A1 (en) Adaptive internal memory error scrubbing and error handling
US11966286B2 (en) Read retry to selectively disable on-die ECC

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16842532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016842532

Country of ref document: EP