CN105843699B - Dynamic random access memory device and method for error monitoring and correction - Google Patents
Dynamic random access memory device and method for error monitoring and correction Download PDFInfo
- Publication number
- CN105843699B CN105843699B CN201610064309.6A CN201610064309A CN105843699B CN 105843699 B CN105843699 B CN 105843699B CN 201610064309 A CN201610064309 A CN 201610064309A CN 105843699 B CN105843699 B CN 105843699B
- Authority
- CN
- China
- Prior art keywords
- error
- group
- dram
- dram device
- mistake
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1044—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
- Debugging And Monitoring (AREA)
Abstract
Embodiment of the disclosure provides a kind of method for monitoring the integrity of dynamic random access memory (DRAM) equipment for being embedded in error-correcting code (ECC) and predicting its failure.On the dram device the insertion of additional register, to store the information in relation to DRAM, for example, the number of equipment soft error detected and position.When DRAM device detects soft error, the information in adjunct register will be updated storage.Controller is compared the information being stored in adjunct register with dependent thresholds.In certain embodiments, after the information is compared with dependent thresholds, controller can decide whether scheduling repair action.In some other embodiment, controller can decide whether warning Memory Controller: DRAM may break down.
Description
Technical field
Generally speaking, this disclosure relates to computing hardware field, more particularly, this disclosure relates to error correction is embedded in
Dynamic random access memory (DRAM) equipment of code (ECC) and the register of record and patch memory mistake are coupled, with
And for low temperature attack monitoring DRAM device.
Background technique
Dynamic storage unit charge storage in the capacitor.The size of these capacitors constantly reduces, to adapt to not
Break increased storage requirement.As capacitor becomes smaller and smaller, dynamic storage unit become increasingly easy receive because
The influence of single unit soft error caused by the interference and background radiation of unit retention time of reduction, electricity or magnetic.For
Control the increase of soft error, certain DRAM manufacturers are just directly embedded in error-correcting code (ECC) on the dram device, Er Feiyi
Rely in central processing unit (CPU) or system memory controller.
As time goes by, DRAM capacitor can lose their charge, it is therefore necessary to refresh to them, to keep away
Exempt to lose information.Many DRAM devices have the maximum refresh interval of Millisecond.DRAM capacitor loses the speed of its charge
Rate is likely to be dependent on temperature.If DRAM capacitor is cooled down suddenly, charge, which may retain, to be longer than them and is in normal operating
Time when temperature, possible last for several minutes to a few hours, rather than common several seconds.
Summary of the invention
Embodiment of the invention discloses one kind for monitoring that the dynamic randon access for being embedded in error-correcting code (ECC) is deposited
The integrity of reservoir (DRAM) equipment and the apparatus and method for predicting its failure.In one embodiment, the disclosure includes one kind
It is embedded in the DRAM device of ECC.The DRAM device further includes the continuous counter of the number of a storage mistake detected
The register group of the storage address of register and a storage mistake detected.DRAM device further includes an ECC control
Device processed, wherein the controller is configured so that ECC executes error checking and correction (EDAC).
In another embodiment, the disclosure includes a kind of soft in the DRAM device for be embedded in ECC for recording and correcting
The method of mistake.The DRAM device carries out ECC inspection to a word, to judge in the word with the presence or absence of any soft error.When
When detecting mistake, the error count being incremented by the register of storage on the dram device, and corresponding to the mistake
The storage address of position retains in register group on the dram device.
In further embodiments, the disclosure includes a kind of method for predicting failure in DRAM.DRAM device receives
One group of memorizer information in relation to DRAM.The DRAM device processing storage stack information, to determine one group of error indicator.
Then DRAM device is compared one group of error indicator to relevant threshold value, if in one group of error indicator
At least one of be more than its relevant threshold value, then to Memory Controller send alert.
In further embodiments, the disclosure includes a kind of for detecting dynamic random access memory (DRAM) equipment
Low temperature attack, and the method that responds is attacked to the low temperature.One group of storage is handled by using one group of decision parameters
Device information determines one group of error indicator.Then error indicator is compared with attack signature group.If error indicator
It is matched with attack signature faciation, then forbids the access to DRAM device.
More embodiments of the disclosure relate generally to the Stateful Inspection DRAM device for instruction low temperature attack, and right
The system and computer program product that the low temperature attack responds.
The above summary of the invention is not intended to each realization for describing each embodiment described or the disclosure.
Detailed description of the invention
The figure being included in the application is incorporated to this specification, and forms a part of this specification.They are illustrated
The embodiment of the present invention, and with this description together principle for explaining the present invention.These figures illustrate only allusion quotation of the invention
The embodiment of type, is not construed as limiting the invention.
Fig. 1 illustrates the high-level block diagram according to an example computer system of the embodiment of the present disclosure, the example meters
Calculation machine system can be used for realizing one or more methods, tool, module and any relevant function described herein.
Fig. 2 is flow chart, illustrates and is deposited according to the dynamic random for being embedded in ECC function that is used to record of the embodiment of the present disclosure
The method of mistake in access to memory (DRAM) equipment.
Fig. 3 is flow chart, is illustrated according to the embodiment of the present disclosure for identifying in the DRAM device for being embedded in ECC function
In take repair action demand method.
Fig. 4 is to be embedded in ECC function, error logging unit and fault detection unit according to the embodiment of the present disclosure
The structure chart of example DRAM device.
Fig. 5 is according to the embodiment of the present disclosure for predicting to be embedded in the method for failure in the DRAM device of ECC function
Flow chart.
Fig. 6 is to be embedded in ECC function, error logging unit and low temperature attack detecting list according to the embodiment of the present disclosure
The structure chart of the example DRAM device of member.
Fig. 7 is to illustrate attacking for detecting low temperature in the DRAM device for being embedded in ECC function according to the embodiment of the present disclosure
The flow chart for the method hit.
Specific embodiment
Generally speaking, this disclosure relates to computing hardware field, more particularly, this disclosure relates to error correction is embedded in
Dynamic random access memory (DRAM) equipment of code (ECC) repairs the register phase coupling of the demand of memory with record and mark
It closes, and for low temperature attack monitoring DRAM device.However the disclosure is not limited to such application, it can be according to this explanation
Described in book, various aspects of the disclosure is understood by the discussion to different instances.
DRAM is an independent electrical in a kind of memory cell of each bit storage data in integrated circuits
Random access memory in container.Can charge to capacitor, its electric discharge can also be enabled, indicate the two of data bit into
Numerical value (1 or 0) processed.Sometimes, a bit spontaneously can be turned to opposite binary value from a binary value, thus
Soft error is caused.Electricity or magnetic disturbance hit the alpha particle an of unit and background radiation may cause soft error
Accidentally.
The consequence of soft error may depend on system in memory.In the system without ECC, soft error be may cause not
Noticeable consequence, it is also possible to lead to system crash or data corruption.For example, it is assumed that by ASCII fromat storage number
Spreadsheet is loaded into the memory of application, and digital " 8 " are then inputted a data cell, then save electronic data
Table." 8 " can be indicated by binary bit sequence 00111000, wherein each of sequence bit storage is in independence
Memory cell in.If alpha particle hits the minimum of storage binary bit sequence before saving spreadsheet
Significant bit (most right), causing bit to overturn from 0 is 1, then when next spreadsheet is reloaded in memory,
Data cell previously including digital " 8 " may include digital " 9 " now.The variation of even now does not always cause system unstable
Fixed but scientific for operation with the system of the financial application calculated and for file server, influence is unacceptable
's.
Those systems that can't stand data corruption can be used ECC memory and correct occurred mistake.ECC memory
Additional memory chip can be used, to allow to add check bit.It is deposited when being accessed during reading and writing or refresh operation
When storage unit, Memory Controller, or, recently, central processing unit (CPU) can be used together ECC with check bit, with inspection
Debugging misses.If an error is found, then Memory Controller or CPU, can correct mistake, depending on the number of the bit overturn
Mesh and used ECC.The example of ECC for correcting soft error includes Hamming code and Reed-Solomon code.
Cold boot attack is a kind of attack of wing passage, wherein after restarting machine using cold restart, is had pair
The attacker that computer carries out physical access can retrieve encryption key from the operating system being currently running.After power is turned off, institute
The data remanent magnetism that attack is stated dependent on DRAM retrieves the readable memory content to exist.In cold boot attack,
It is not allowing operating system to execute its shutoff operation and power supply is closed in memory content dump in the case where file.Low temperature
Attack is a kind of cold boot attack, wherein DRAM cooling first, to slow down the capacitor leakage in each DRAM cell.By subtracting
Memory leakage in slow DRAM device, attacker can successfully steal to the more DRAM information of file dumping to improve
A possibility that taking encryption key.
As used herein, " memorizer information " is that can be used for predicting failure in DRAM, or can be used for detecting and deposit
Any information in relation to DRAM that low temperature is attacked on reservoir.For example, memorizer information may include error count, the temperature of DRAM
Degree or in the case where no write operation the number of sequence read operation counting." word " is used in the design of specific processor
One natural units of data, it is related with the size of bus transmission.Certain modern computers and server use 64 bits
Word, but there is also the disclosure should not be limited to any specific word size to other word sizes.
" error rate " refers to the rate that new mistake occurs in DRAM.For example, if going out in 3 seconds time interval
Existing 15 new mistakes, then error rate is 5 new mistakes per second." mistake acceleration " is the variation of a period of time error rate.For example,
If error rate becomes 10 new mistake/seconds from 5 new mistake/seconds in 1 second time interval, wrong acceleration is per second
Square 5 new mistakes." error indicator " is that can be compared with established threshold value, to judge whether to occur DRAM event
Barrier, or whether there is any information for the related DRAM that the low temperature on DRAM is attacked.For example, error indicator can be mistake
Counting, error rate, mistake acceleration or DRAM temperature." dependent thresholds " are the threshold corresponding to a certain given error indicator
Value.For example, the maximum number for the mistake that the dependent thresholds of error count can be tolerated for DRAM, and the related threshold of error rate
Value can be the largest tolerable rate of mistake new in DRAM.
" repair action " includes any movement executing on DRAM, repairing or prevent soft error.For example, certain
In embodiment, repair action can be run memory erasing operation.In some other embodiment, especially when memory list
Repair action can back up or mark the hardware when first perhaps row has had many wrong, so that computer system will
Information no longer is stored in affected memory cell perhaps row or finally replaces DRAM.
Returning now to attached drawing, Fig. 1 is that such as method described herein, tool, module, Yi Jiren wherein may be implemented
The illustrative embodiments (for example, using one or more processor circuits or the computer processor of computer) of what correlation function
Example computer system (for example, server) 101 high-level block diagram.In certain embodiments, the master of computer system 101
Wanting component may include that one or more CPU 102, Memory Controller 105, memory 104, terminal interface 113, reservoir connect
Mouth 114, input/output (1/O) equipment interface 116 and network interface 118, can be via memory bus 103, I/O bus
112 and I/O Bus Interface Unit 111 is directly or indirectly communicably coupled all of which, to carry out component
Between communication.
Computer system 101 may include one or more general programmable central processing unit (CPU) 102A, 102B, 102C,
And 102D, herein, generally referred to as CPU 102.In certain embodiments, computer system 101 may include more
A processor is a kind of typical sizable system.In some other embodiment, computer system 101 can be one
Single cpu system.Each CPU 102 can execute the instruction being stored in memory 104, and may include one layer or more veneer
Cache memory (not shown).
Memory 104 may include the computer system readable media in volatile memory form, for example, dynamic random is deposited
Access to memory (DRAM) 106.Computer system 101 can also include detachable/non-dismountable, volatile/non-volatile computer system
System storage medium.Be only for example, storage system can be provided, with from non-dismountable, non-volatile magnetic medium (for example, " hard-drive
Device ") it reads and is written to them.Although being not shown, disc driver can also be provided, with from detachable, nonvolatile magnetic disk
(for example, floppy disk) reads and is written to them;Or CD drive, with from detachable, nonvolatile optical disk (for example, CD-
ROM, DVD-ROM or other optical mediums) it reads or is written to them.In addition, memory 104 can also include such as flash memory
The flash memory of stick or flash drive.Memory devices can be connected to by one or more data media interfaces and be deposited
Memory bus 103.Memory 104 may include that there is one group (at least one) its configuration to be intended to execute each embodiment at least one
Function program module program product.
Memory 104 can also include ECC check bit 107, ECC controller 108 and error logging unit 110.It is wrong
Accidentally recording unit may include wrong address register (EAR) group 110A and error count register (ECR) 110B.EAR group
110A can be one group of register, wherein the position of each register storage mistake detected of ECC controller 108.For example,
EAR group 110A can store row address, column address or the row that wherein ECC controller 108 detects the memory cell of mistake
With column address.ECR 110B can be the register of wherein storage error count.Error count is found by ECC controller 108
Multiple mistakes Continuous plus.ECR 110B and EAR group 110A can periodically be reset.Alternatively, exist
It, can also be by Memory Controller 105, ECC controller 108 or user ECR 110B and EAR group in some embodiments
110A is arranged again as a part of repair action.
ECC controller 108 can be configured so that check bit 107 and error-correcting code (for example, Hamming code and
Reed-Solomon code) execute DRAM 106 on forward error correction (FEC).ECC check bit 107 stored in memory
Number can depend on the size and used error-correcting code of DRAM 106.ECC controller 108 can also be configured to work as it
It is incremented by ECR 110B when detecting soft error, and the information (for example, row and column address of mistake) in relation to mistake is stored in
In EAR group 110A.
In certain embodiments, error logging unit 110 may include more storing posting for related DRAM more information
Storage and register group.For example, in certain embodiments, error logging unit 110 can store a multi-bit errors and count,
It is all when ECC controller 108 detects a multi-bit errors its be incremented by.In certain embodiments, error logging unit 110 can
By store calculate the number of mistake found in specific memory group in DRAM 106 specifically for the mistake specifically organized in terms of
Number.In addition, all when finding uncorrectable error in DRAM 106, error logging unit 110 can also be stored to storage
Device controller 105 sends the unrecoverable error mark of warning.
In certain embodiments, as described above, ECC controller 108 can with error logging unit 110 directly into
Row communication, for example, for the error count being incremented by ECR 110B.However, in some other embodiment, ECC controller 108
It can be communicated indirectly with error logging unit 110.For example, when errors are detected, ECC controller 108 can be to depositing
Memory controller 105 alerts the position of mistake.Then, Memory Controller 105 can be communicated with error logging unit, with
It is incremented by ECR 110B, and wrong address is stored in EAR group 110A.
In some embodiments it is possible to which there are multiple Memory Controllers.For example, CPU can have integrated memory
Controller is designed to interact with External memory equipment.In certain embodiments, external storage controller may include
ECC controller.
Memory 104 may include unshowned additional chip, sensor or controller.For example, memory 104
It may include temperature sensor, fault detection unit or low temperature attack detecting unit.Fault detection unit can be predicted to store
The beginning of catastrophic failure in device 104, and send and alert to controller.Low temperature attack detecting unit can monitor DRAM 106
Whether there is the just experience low temperature attack of memory 104, i.e., a kind of sign of cold boot attack.Temperature sensor can monitor DRAM
Operation temperature, and assisted cryogenic attack detecting unit.Fault detection unit will be discussed more fully referring to Fig. 4 and Fig. 5, with
And low temperature attack detecting unit is discussed more fully referring to figure 6 and figure 7.
Although memory bus 103 is described as in CPU 102, memory 104 and I/O bus interface 111 in Fig. 1
Between provide the unibus structure of direct communication path, but memory bus 103 also may include more in certain embodiments
The different bus of item or communication path, can be by any different form (for example, the point-to-point in hierarchical structure links, star
Type or net type configuration, multilayered structure bus, parallel or redundant path or any other appropriate type configuration) it sets
Set these different buses or communication path.Moreover, although I/O bus interface 111 and I/O bus 112 are shown as individually
Unit independent, but in certain embodiments computer system 101 may include multiple I/O Bus Interface Units 111,
A plurality of I/O bus 112 or both include multiple I/O Bus Interface Units 111 or including a plurality of I/O bus 112.In addition, although
It shows multiple I/O interface units (they are mutually separated I/O bus 112 with each communication path for extending to each I/O equipment), but
In some other embodiment, certain or whole I/O equipment can be directly connected in one or more system I/O bus.
In certain embodiments, computer system 101 can be multi-user's mainframe computer systems, single user system,
Either server computer or have seldom or without end user's interface, and from other computer systems (client computer) receive
The similar devices of request.In addition, in certain embodiments, computer system 101 can also be embodied as a desktop computer,
Portable computer, laptop or notebook computer, tablet computer, pocket computer, phone, smart phone, network
The electronic equipment of exchanger or router or other any appropriate types.
Note that Fig. 1 is intended to describe some representative main components an of illustrative computer system 101.So
And in certain embodiments, each component may be more complicated than component shown in Fig. 1 or simple, it is understood that there may be different from Fig. 1
Component in addition to shown component or the component shown in Fig. 1, number, type and the configuration of such component can be different.
Referring now to Fig. 2, Fig. 2 shows be used to record the storage in relation to DRAM device according to an embodiment of the present disclosure
A kind of flow chart of instance method 200 of device information.In some embodiments it is possible to pass through the ECC being embedded in memory 104
Controller 108 (shown in Fig. 1) executes method 200.It, can depositing by computer system 101 in some other embodiment
Memory controller 105 executes method 200.The method may begin at operation 202, wherein executed in DRAM device reading,
It writes or refresh operation.
As the reading and writing of the word at storage address or a part of refresh operation or connecting, at operation 204,
ECC controller can be with the mistake of check word.ECC controller can be used existing error-correcting code or algorithm (for example,
Hamming code and Reed-Solomon code) check word mistake.
At operation 206, ECC controller may determine that whether any mistake is detected in word during operation 204.Such as
Fruit does not detect mistake, then the method terminates.However, being operated every time if detecting mistake at operation 204
ECC controller can be incremented by the error count being stored in ECR when 208, and the row address of errors present is stored in RAR group
In available register in.
At operation 210, ECC controller can be compared error count with error thresholds.For example, error thresholds can
Think the maximum number of tolerable mistake in DRAM.If error count is lower than the threshold value, the method terminates.So
And if it is more than error thresholds that ECC controller, which determines error count, ECC controller will be arranged error flag, such as operate
Described in 212.Error flag can be a piece of news, store in memory or be transmitted directly to memory control
Device processed indicates that error count alreadys exceed the threshold value.It in some alternative embodiments, can be by DRAM device
A pin be driven to it is high or low, be arranged error flag.
It shows according to an embodiment of the present disclosure referring now to Fig. 3, Fig. 3 for monitoring the one of DRAM device integrity
The flow chart of kind instance method 300.In some embodiments it is possible to pass through the ECC controller 108 being embedded in memory 104
(shown in Fig. 1) executes method 300.It, can be by being integrated in the memory of computer system 101 in some other embodiment
Controller 105 executes method 300.It, can be by the dedicated control that is embedded in DRAM device in other other embodiments
Device or chip execute method 300.The method may begin at operation 302, wherein Memory Controller judges whether to be arranged
Error flag.For example, Memory Controller can be judged whether by reading the message being stored in specific storage address
Provided with error flag.If discovery indicates that error thresholds be exceeded predetermined disappears at the storage address
Breath, then Memory Controller can be determined provided with error flag.
If Memory Controller, which determines, is provided with error flag, Memory Controller can be in each operation 304
Execute a repair action.If Memory Controller determines at operation 302 is not provided with error flag, memory control
Device can be in each operation 306 from error logging unit search memory information.The memorizer information retrieved may include
The error count being stored in ECR 110B and the erasure list being stored in EAR group 110A.Memory Controller can be with
The new mistake of big figure is judged whether there is at operation 308.In order to judge whether there is the new mistake of big figure, memory control
Device processed can be compared the total number of new mistake with a threshold value.
In certain embodiments, Memory Controller can be the number of mistake new in a certain specific register group and special
It is compared for the threshold value specifically organized.In some other embodiment, Memory Controller can be at a certain specific address
The number of the new mistake of (being included at specific row or column) is compared with corresponding address threshold.Under any circumstance,
The threshold value as described in user configuration can also be arranged by memory manufacturer, and be stored in DRAM device
In module in upper or nonvolatile storage.If Memory Controller determines the new mistake that big figure is not present, described
Method will terminate.
When Memory Controller determines the new mistake there are big figure, Memory Controller can be in each operation 310
When dispatch a repair action.In certain embodiments, Memory Controller can determine there are big figure it is new mistake and
After available reparation resource, it is immediately performed repair action.
It shows referring now to Fig. 4, Fig. 4 according to the failure being embedded in for predicting DRAM of an embodiment of the present disclosure
The structure chart of the DRAM device of ECC.The DRAM device includes DRAM array 402, ECC controller 406, error logging unit
408 and fault detection unit 410, the I/O 404 by being connected to exterior I/O 412 directly or indirectly can by them
Communicatedly it is coupled with computer system (not shown).I/O 404 is DRAM driver, and insertion is on the dram device, outside
Portion I/O 412 provides voltage or current.Exterior I/O 412 can be the new additional pin on DRAM device, can also
Be existing pin new multiplexing definition.
Error-correcting code (for example, Hamming code and Reed-Solomon code) detection can be used in ECC controller 406
Single-bit and multi-bit errors in DRAM array 402.According to used error-correcting code, ECC controller 406 can also school
Mistake just detected, especially in the case where single-bit error.When ECC controller 406 detects mistake, storage
Device information is stored in error logging unit 408.
Memorizer information may include: wrong address 408A, be the unit that wherein ECC controller 406 detects mistake
Storage address;And error count 408B, it has been controlled by ECC since resetting for the last time from error logging unit 408
The number for the mistake that device 406 is found continuously is calculated.In certain embodiments, error logging unit 408 can store
More memorizer informations.For example, error logging unit 408 can store corresponding to ECC controller more bits mistakes detected
Second error count of number accidentally.In some other embodiment, ECC controller 406 be can store to ECC controller 406
What detected mistake was calculated in the unique group of each of DRAM array 402 counts specifically for the mistake specifically organized
Number.In other other embodiments, the memorizer information being stored in error logging unit 408 may include uncorrectable
Error count (number of uncorrectable error detected by ECC controller 406), the temperature of DRAM and error flag.
Fault detection unit 410 can store decision parameters 411.Decision parameters 411 are fault detection units 410 for true
Determine the error rate of such as DRAM and the error indicator of wrong acceleration and is compared with error indicator to predict DRAM
The parameter of the threshold value of middle failure.For example, decision parameters may include threshold value 411A, 411B and 411C and time cycle 411D
And 411E.First threshold 411A can be largest tolerable error count, and second threshold 411B can be largest tolerable mistake
Rate and third threshold value 411C can be largest tolerable mistake acceleration.Period first time, 411D can be used for calculating mistake
Rate, and second time period 411E can be used for calculating wrong acceleration.
Fault detection unit 410 can monitor error count 408A on period first time 411D, to determine error rate.
Then, fault detection unit can monitor error rate on second time period 411E, to determine wrong acceleration.Once failure
Detection unit has determined that error count, error rate and wrong acceleration, they can be associated with threshold value
411A, 411B and 411C are compared respectively.If error count, error rate and wrong acceleration are more than that they are related
Threshold value, then fault detection unit can alert Memory Controller: DRAM may break down or memory may become
It is unstable.
In certain embodiments, fault detection unit may include the memory being stored in error logging unit 408
The controller that information and decision parameters 411 are compared.It, can be ECC controller 406 or outer in some other embodiment
Portion's Memory Controller (for example, Memory Controller 105 shown in Fig. 1) is configured to a memorizer information and decision parameters
411 are compared.In these embodiments, fault detection unit 410 only can store decision parameters 411.
Fig. 4 describes the representative main component of an example DRAM device 401.However, in some embodiments
In, each component may be more complicated than component shown in Fig. 4 or simple, it is understood that there may be different from component shown in Fig. 4 or remove
Component except component shown in Fig. 4, number, type and the configuration of such component can be different.For example, in certain realities
It applies in example, a single memory controller can be configured to execute the function of ECC controller 406 and fault detection unit 410.
In some embodiments it is possible to by user configuration decision parameters 411.In some other embodiment, Ke Yiyou
DRAM manufacturer is loaded previously into decision parameters 411 in nonvolatile storage.In some embodiments it is possible to be executed in DRAM
A certain given operation (for example, memory erasing operation) resets the information being stored in error logging unit 408, example later
Such as, mistake address 408A and error count 408B.In some other embodiment, by fault detection unit 410 or it can deposit
Memory controller 105 resets error logging unit 408.
It referring now to Fig. 5, Fig. 5 is shown according to an embodiment of the present disclosure for predicting the failure of DRAM device
The flow chart of one instance method 500.In some embodiments it is possible to pass through the fault detection unit 410 being embedded in DRAM
Execution method 500.In some other embodiment, it can be executed by the Memory Controller 105 for being integrated in computer system 101
Method 500.The method may begin at operation 502, wherein fault detection unit receives memory letter from error logging unit
Breath.
At operation 504, fault detection unit can be determined by processing in the received memorizer information in 502 places of operation
Error indicator.Error indicator can add for error count, error rate (new mistake appears in the rate in DRAM) and mistake
Speed (variation of a period of time new error rate).In certain embodiments, error indicator also may include uncorrectable
Error rate (uncorrectable error appears in the rate in DRAM) and uncorrectable error acceleration (for a period of time can not schools
The variation of positive error rate).If the number of the bit overturn is more than insertion, error-correcting code on the dram device can
The maximum number of correction is then considered as mistake uncorrectable.It is being determined that error indicator (is in this example, mistake
Counting, error rate and wrong acceleration) after, fault detection unit may determine that whether any error indicator is more than it
Dependent thresholds.
Firstly, fault detection unit may determine that whether error count is more than its dependent thresholds when operating 506 every time.Such as
Fruit error count is more than its dependent thresholds, then fault detection unit can be to Memory Controller 105 (in Fig. 1 at operation 512
It is shown) warning is sent, then the process can terminate.If error count is not above its dependent thresholds at operation 506,
Fault detection unit may determine that whether error rate is more than its dependent thresholds when then operating 508 every time.If error rate is more than it
Dependent thresholds, then fault detection unit can send to Memory Controller 105 at operation 512 and alert, then the process
It can terminate.If being not above its dependent thresholds in error rate, fault detection unit may determine that mistake when operation 510 every time
Accidentally whether acceleration is more than its dependent thresholds.If wrong acceleration is more than its dependent thresholds, fault detection unit is being operated
It can send and alert to Memory Controller 105 at 512, then the process can terminate.If do not had in wrong acceleration
More than dependent thresholds, then the method can restart at operation 502, and wherein fault detection unit can be from error logging
Unit receives memorizer information.
The warning at operation 512 to Memory Controller 105 can be realized in many ways.In certain embodiments,
Sending warning to Memory Controller 105 may include the dedicated pin for being promoted or being reduced on DRAM device, for example, DRAM
Parity error pin in equipment is driven to high or low.In other other embodiments, sends and warn to Memory Controller 105
Announcement may include data read-out by intentional destroy, to trigger cyclic redundancy check (CRC) mistake.In other other embodiments
In, sending warning to Memory Controller 105 may include sending predefined data pattern to controller, and controller will be it
It is identified as error signal.The disclosure does not require, and also should not be limited to for the warning when error indicator is more than its dependent thresholds
Any specific method of Memory Controller.
It is shown referring now to Fig. 6, Fig. 6 and is able to detect and responds low temperature according to being embedded in for an embodiment of the present disclosure and attack
The structure chart of the DRAM device of the ECC hit.The DRAM device includes DRAM array 402, ECC controller 406, temperature sensor
618, error logging unit 408, low temperature attack detecting unit 610 and fuse 612, by being connected to data pin 620
I/O 404 and order (CMD) decoder 614 for being connected to order (CMD) pin 616 are communicatively coupled to DRAM device to count
Calculation machine system (not shown).
Data pin 620 is the data I/O pin of DRAM device.Data pin 620 is bi-directional pin, and having allows data
The input capability write and the fan-out capability for allowing data to read.According to equipment, may exist 4,8,16 or 32 data pins
620.Other configurations are also possible, and the disclosure should not be limited to the equipment with any given number data pin 620.
CMD pin 616 is one group of input pin on DRAM device, and the element address that they provide and will access is more
The order (for example, reading and writing, refreshing) of road multiplexing.CMD decoder 614 is the decoder on DRAM device, explains and provides
Encoded order input on CMD pin 616, and DRAM is made to be able to carry out operation (for example, reading and writing, refreshing) appropriate.
DRAM array 402, ECC controller 406 and error logging unit 408 press the operation described above by reference to Fig. 4.
Memorizer information is stored in mistake using the mistake in error-correcting code detection DRAM array 402 by ECC controller 406
In recording unit 408.In addition, error logging unit 408 also store its from temperature sensor 618 received DRAM temperature
608C。
Very alike with fault detection unit described herein, low temperature attack detecting unit 610 stores multiple decision ginsengs
Number 611.Decision parameters 611 can be set by user configuration decision parameters 611 or by manufacturer, be attacked and examined with assisted cryogenic
It surveys unit 610 and judges whether DRAM is undergoing low temperature attack.
For example, period first time 611C, which can be used, in low temperature attack detecting unit 610 calculates uncorrectable error rate,
And uncorrectable error acceleration is calculated using second time period 611D.Low temperature attack detecting unit 610 may determine that
Whether uncorrectable error rate and uncorrectable error acceleration are more than first threshold 611A and second threshold 611B respectively.
Alternatively, in certain embodiments, low temperature attack detecting unit 610 can also be DRAM temperature 608C and temperature threshold 611E
Be compared, with judge DRAM whether indicating low temperature attack temperature on operate.
If low temperature attack detecting unit 610 determines that DRAM just undergoes low temperature to attack, can forbid on DRAM
The access of information.In some embodiments it is possible to realize this point by cutting fuse 612.For example, fuse 612 can be with
For an electronic fuse (e- fuse), programmable resistance or phase-change resistor.
It shows according to an embodiment of the present disclosure referring now to Fig. 7, Fig. 7 for detecting and responding the low of DRAM device
The flow chart of one instance method 700 of temperature attack.In some embodiments it is possible to be attacked by the low temperature for being embedded in memory 104
Hit 610 execution method 700 of detection unit.It, can be by being integrated in the memory of computer system 101 in some other embodiment
Controller 105 executes method 700.It, can be the method includes in DRAM control logic in other other embodiments.
Firstly, method 700 may include determining one group by using one group of decision parameters processing storage stack information
Error indicator.Then, low temperature attack detecting unit can be this group of error indicator and a predetermined attack signature
Group is compared.The attack signature group can be when one group of DRAM device experience low temperature is attacked or DRAM device is electric
The state being likely to occur when detection.For example, in certain embodiments, attack signature group can be high error rate, especially
When mistake can not timing.In some other embodiment, attack signature group can be high wrong acceleration, equally, special
It is not when mistake can not timing.In other other embodiments, attack signature group can be extremely low temperature.
In some embodiments of the disclosure, attack signature group can be stored on the dram device by DRAM manufacturer
In nonvolatile storage.For example, attack signature group is also possible to be stored in as decision parameters in low temperature attack detecting unit
Threshold value.It, can be by user configuration attack signature group in some other embodiment.In other other embodiments, Ke Yiyou
DRAM manufacturer stores the first attack signature group in nonvolatile storage on the dram device, and user can also establish more
More attack signature group, to meet its specific demand.
In certain embodiments, attack signature group can be one of high error rate, high wrong acceleration and/or low temperature
Combination.For example, attack signature group character is turned to high unrecoverable error rate (in the first threshold in instance method 700
On value) and improve unrecoverable error acceleration (on second threshold), wherein if computer system is just being undergone just
Normal shutdown operation, then the unrecoverable error acceleration improved are lower than the unrecoverable error acceleration of desired raising
(that is, being lower than third threshold value).Method 700 may begin at operation 702, wherein low temperature attack detecting unit is from error logging list
Member receives memorizer information.
At operation 704, low temperature attack detecting unit can determine uncorrectable error (UE) rate and acceleration.It can be with
By calculating period specified time (for example, being stored in the first time period 611C in low temperature attack detecting unit 610)
On the number of uncorrectable error that is found, determine UE rate.For example, if period first time 611C be 2 seconds, and
4 new uncorrectable errors are detected during 2 seconds, it is determined that UE rate is 2 uncorrectable errors per second.It can pass through
The variation for calculating UE rate on a period of time (for example, second time period 611D), determines UE acceleration.For example, 1 second week time
On phase, UE rate changes into 3 uncorrectable errors per second from 1 uncorrectable error per second, then can be UE acceleration
It is determined as per second square of 2 uncorrectable errors.
At operation 706, low temperature attack detecting unit may determine that whether UE rate is more than first threshold.For example, can lead to
It crosses and UE rate calculated is realized this point compared with the first threshold 611A being stored in low temperature attack detecting unit 611.
If UE rate is not above first threshold, the method can restart at operation 702.If UE rate is more than the first threshold
Value, then operation 708 every time, low temperature attack detecting unit may determine that whether UE acceleration is more than second threshold.
If UE acceleration is not above second threshold, the method can restart at operation 702.If UE
Acceleration is more than second threshold, then low temperature attack detecting unit may determine that whether UE acceleration is more than third at operation 710
Threshold value.UE acceleration is compared with third threshold value, for example, it may be the influence and system of the low temperature attack on memory
Restarting or the influence closed to memory are distinguished.If UE acceleration is more than that third threshold value (is indicating system
Close or restarting), then it can restart the method at operation 702.If UE acceleration is not above
Three threshold values, then low temperature attack detecting unit can forbid the access to information on DRAN at operation 712, and can terminate institute
State process.
At operation 712, there is many and forbid the method to access to the information on DRAN.For example, in certain implementations
In example, the permanent self-desttruction equipment resistance of the fuse 612 that such as fuses (cut off the power or voltage regulator exports) can be used
Only to the access of information.In some other embodiment, when detecting low temperature attack, DRAM will fuse fuse 612, thus
Make CMD decoder 614 that can not work.When making CMD decoder 614 that can not work, the order of all inputs will be ignored, and
DRAM would not allow for reading its data.
In other other embodiments, entire array can be write as known state by DRAM.Offer row ground is provided
The refresh address counter of location, cycle through all row addresses, execute one specifically write circulation (its make by force it is all read put
Big device is in scheduled state) it realizes and the covering of all units is write.
As discussed in more detail herein, it should be appreciated that can be executed by optional order described herein
Certain or all operationss of some embodiments of method, can also not execute;In addition, multiple operations can occur simultaneously
Or an interior section as a larger process occurs.
The present invention can be a kind of system, method and/or computer program product.Computer program product may include meter
Calculation machine readable storage medium storing program for executing, containing for making processor realize that the computer-readable program of various aspects of the invention refers to
It enables.
Computer readable storage medium, which can be, can save and store the tangible of the instruction used by instruction execution equipment
Equipment.For example, computer readable storage medium can be, but be not limited to electronic storage device, magnetic storage apparatus, optical storage are set
Any suitable combination of standby, electric magnetic storage apparatus, semiconductor memory apparatus or above equipment.Computer readable storage medium
A non exhaustive list of more specific examples include: portable computer diskette, hard disk, random access memory
(RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory
(SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding
Equipment (for example, the structure of the punch card with recorded instruction or groove internal projection thereon) and above-mentioned storage medium
Any suitable combination.Computer readable storage medium used herein above is not construed as instantaneous signal itself, for example, nothing
The electromagnetic wave of line electric wave or other Free propagations, the electromagnetic wave propagated by waveguide or other transmission mediums are (for example, pass through
The light pulse of fiber optic cables) or pass through electric wire transmit electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or by the network of such as internet, local area network, wide area network and/or wireless network download to outer computer or
External memory equipment.The network may include copper transmission cable, optical transmission fiber, wireless transmission, router, firewall, friendship
It changes planes, gateway computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment from
Network receives computer-readable program instructions, and forwards the computer-readable program instructions, to be stored in each calculating/processing
In computer readable storage medium in equipment.
Computer-readable program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA)
Instruction, machine instruction, the instruction dependent on machine, microcode, firmware instructions, condition setup data or with one or more volume
The source code or object code that any combination of Cheng Yuyan is write, the programming language include Smalltalk, C++ etc.
The conventional procedure formula programming language of the programming language of object-oriented and such as " C " programming language or similar programming language.
Computer-readable program instructions can execute fully on the user's computer, partly execute on the user's computer,
It is executed as an independent software package, part partially executes on the remote computer on the user's computer or complete
It executes on a remote computer or server.In the case where being related to remote computer, can by include local area network (LAN) or
Any kind of network of person's wide area network (WAN) is connected to remote computer in the computer of user, alternatively, can connect in outer
Portion's computer (such as using ISP, passing through internet).In certain embodiments, it may for example comprise programmable
The electronic circuit of logic circuit, field programmable gate array (FPGA) or programmable logic array (PLA) can use calculating
The status information of machine readable program instructions executes computer-readable program instructions, with individual electronic circuit, so that this can be realized
The various aspects of invention.
Herein with reference to the flow chart of the method, apparatus (system) and computer program product of embodiment according to the present invention
Illustrate and/or structure chart describes various aspects of the invention.It should be recognized that the flow chart illustrates and/or structure chart
Each box and flow chart illustrate and/or structure chart in each box combination, can be subject to by computer-readable program instructions
It realizes.
These computer program instructions can be provided in general purpose computer, special purpose computer or other programmable numbers
According to the processor of processing unit, so that a kind of machine is produced, so that passing through computer or other programmable datas processing dress
The instruction that the processor set executes can create for realizing function pointed in flow chart and/or each box of structure chart/move
The mechanism of work.These computer program instructions can also be stored in computer-readable medium, these instructions can command meter
Calculation machine, other programmable data processing units or other equipment operate in a specific way, to make to be stored in computer-readable
It includes function action pointed in implementation flow chart and/or each box of structure chart that instruction in medium, which can generate one,
The manufacture product of instruction.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or its
In its equipment, so as to execute series of operation steps in computer, other programmable devices or other equipment, to produce
A raw computer implemented process, then operates in computer, other programmable data processing units or other equipment
Instruction can be realized in flow chart and/or each box of structure chart pointed function action.
Flow chart and block diagram illustrating in the attached drawing system of different embodiment according to the subject invention, method, Yi Jiji
Architectural framework in the cards, function and the operation of calculation machine program product.In this regard, in flow chart or structure chart
Each box can represent a module, program segment or a part of instruction, the module, program segment or a part of instruction packet
One or more is included for realizing the executable instruction of specified logic function.In certain optional realizations, marked in box
The function of note can also occur to be different from the order marked in attached drawing.For example, two boxes continuously described actually may be used
To execute substantially in parallel, they can also be executed in the opposite order sometimes, depend on related function.It is also noted that
, structure chart and/or flow chart illustrate in each box and structure chart and/or flow chart illustrate in box group
Close, can be realized by the system based on specialized hardware for executing specified function or movement, can also by specialized hardware with
The combination of computer instruction is realized.
Term used herein is only intended to description specific embodiment, it is no intended to limit to each embodiment.Such as
As used herein, singular " one (English a or an) " and " should or described (English (the) " be also intended to
Including plural form, unless context is clearly indicated.It should also be appreciated that when used as contemplated in this specification, term " including
(English includes and/or including) " points out stated characteristic, entirety, step, operation, element and/or component
Presence, but be not excluded for depositing for one or more other characteristics, entirety, step, operation, component, assembly unit and/or their combination
Or addition.In the foregoing detailed description of each embodiment exemplary embodiment, attached drawing is had references to (wherein, with identical number
Word indicates identical pel), the attached drawing forms a part of detailed description, and wherein, is illustratively shown wherein
It can be with the specific exemplary embodiment of practicing various embodiments.These embodiments are described in detail, enough so as to make this
Field technical staff practices these embodiments, however, also can be used in the case where not departing from the range of each embodiment other
Embodiment, and logic, machinery, electricity and other changes can be carried out.In the above description, it elaborates many specific
Details, it is desirable to provide one of each embodiment is comprehensively understood.However, it is also possible to the case where not using these details
Lower practicing various embodiments.Take in example at some other, in order to keep embodiment clear, be not described in known circuit, structure,
And technology.
The difference of word " embodiment " as used in this specification takes example that need not refer to the same embodiment, but can refer to same reality
Apply example.Described herein or described data and data structure are only example, in some other embodiment, can be made
With different data volumes, data type, field, the number of field and type, field name, the number of row and type, record, item
The tissue of mesh or data.Alternatively, it is also possible to which any data are combined with logic, so as to not need independent data
Structure.Therefore, above detailed description is non-limiting.
Although describing the present invention with regard to specific embodiment above, it is contemplated that those skilled in the art will obviously realize
It arrives, the present invention can be changed and be modified.Accordingly, it is intended to which following following claims, which is construed to covering, falls into present inventive concept
With all such changes and modification of range.
Claims (20)
1. a kind of dynamic random access memory DRAM device for being embedded in error-correcting code ECC, the DRAM device include:
DRAM array;
Store the first register of error count;
Store first register group of one group of mistake address;And
ECC controller, wherein ECC controller is configured so that the ECC on DRAM array executes error checking and correction
EDAC, all incremental errors when errors are detected count, and the available deposit in the first register group is written in wrong address
Device,
Wherein the DRAM device further includes the temperature sensor for monitoring DRAM device temperature and the value of storage temperature threshold value
Nonvolatile storage.
2. DRAM device according to claim 1 further includes the second register for storing multi-bit errors and counting.
3. DRAM device according to claim 1 further includes the second register for storing uncorrectable error mark.
4. DRAM device according to claim 1 further includes storing one group of mistake specifically for specific register group to count
The second several register group, wherein described one group each of the error count specifically for specific register group corresponds to
Unique memory group in DRAM array.
5. DRAM device according to claim 1, further includes:
Second register group stores multiple decision parameters;And
Fault detection unit, predicts the failure of DRAM device, and detects the failure row or column of DRAM device.
6. a kind of method for recording and correcting dynamic random access memory DRAM error, which comprises
Use the mistake in word in error-correcting code detection DRAM device;
The detection to mistake is responded, the error count being stored in the first register is incremented by;And
The detection to mistake is responded, available is posted what the wrong address corresponding to errors present was stored in the first register group
In storage,
The wherein method further include:
Whether misjudgment is located in first memory group;And
It detects that mistake is responded in first memory group, is incremented by and is directed to the first error count specifically organized, wherein needle
The error count specifically organized to first is stored in multiple mistakes in first memory group, and will be directed to the first mistake specifically organized
Miscount is stored in the second register group.
7. according to the method described in claim 6, further include:
Whether misjudgment is multi-bit errors;And
It is that multi-bit errors are responded to mistake, is incremented by the multi-bit errors being stored in the second register and counts.
8. according to the method described in claim 6, further include:
Whether misjudgment is uncorrectable;And
Mistake is responded to be uncorrectable, uncorrectable error mark is set.
9. according to the method described in claim 6, further include:
Whether it is more than threshold value that misjudgment counts;And
It is more than that the threshold value is responded to error count, error flag is set.
10. executing repair action according to the method described in claim 9, further including being responded to set error flag.
11. according to the method described in claim 6, further include:
Determine the first error count in first time;
Determine the second error count in the second time, the second time is after the first time;
By the way that the second error count is compared with the first error count, the number of multiple new mistakes is determined;
Whether the number of the new mistake of judgement is greater than new error thresholds;And
To determining that the number of new mistake is more than that new error thresholds are responded, repair action is dispatched.
12. a kind of method for predicting failure in DRAM device, which comprises
Receive the memorizer information in relation to DRAM device;
Memorizer information is handled using one group of decision parameters, to determine error indicator;
Whether the indicator that judges incorrectly is more than relevant error threshold value;And
It is more than that relevant error threshold value is responded to error indicator, is given a warning to controller,
Wherein:
Error indicator includes error rate and wrong acceleration;And
It includes error rate being compared with first threshold, and handle is wrong that whether misjudgment indicator, which is more than relevant error threshold value,
Accidentally acceleration is compared with second threshold.
13. according to the method for claim 12, wherein giving a warning to controller including promoting dedicated pin.
14. according to the method for claim 12, being predefined wherein being given a warning to controller including being sent to controller
Reading data pattern.
15. a kind of method for detecting with low temperature attack on response dynamics random access memory DRAM device, the method
Include:
Receive the storage stack information in relation to DRAM device;
The storage stack information is handled using one group of decision parameters, to determine one group of error indicator;
Judge whether one group of error indicator matches with attack signature faciation;And
One group of error indicator is matched with attack signature faciation and is responded, the access to DRAM device is forbidden, in which:
One group of error indicator includes error rate and wrong acceleration;And
Judge whether one group of error indicator matches with attack signature faciation including error rate and first threshold are compared
Compared with, and a wrong acceleration is compared with second threshold.
16. according to the method for claim 15, wherein forbidding including on cutting DRAM device to the access of DRAM device
Fuse.
17. according to the method for claim 15, wherein forbidding includes that DRAM array is written as to the access of DRAM device
Know state.
18. according to the method for claim 15, wherein by user configuration attack signature group.
19. a kind of system for detecting with low temperature attack on response dynamics random access memory DRAM device, comprising:
Memory;And
Dynamic random access memory DRAM control logic, wherein DRAM control logic makes memory execute following method:
Receive the storage stack information in relation to DRAM device;
The storage stack information is handled using one group of decision parameters, to determine one group of error indicator;
Judge whether one group of error indicator matches with attack signature faciation;And
One group of error indicator is matched with attack signature faciation and is responded, the access to DRAM device is forbidden, in which:
One group of error indicator includes error rate and wrong acceleration;And
Judge whether one group of error indicator matches with attack signature faciation including error rate and first threshold are compared
Compared with, and a wrong acceleration is compared with second threshold.
20. a kind of computer-readable storage for detecting with low temperature attack on response dynamics random access memory DRAM device
Medium, wherein storing program instruction in the computer readable storage medium, when executed, perform claim requires 15
~18 it is one of any the step of.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/611,351 | 2015-02-02 | ||
US14/611,351 US9606851B2 (en) | 2015-02-02 | 2015-02-02 | Error monitoring of a memory device containing embedded error correction |
US14/621,506 US9940457B2 (en) | 2015-02-13 | 2015-02-13 | Detecting a cryogenic attack on a memory device with embedded error correction |
US14/621,506 | 2015-02-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105843699A CN105843699A (en) | 2016-08-10 |
CN105843699B true CN105843699B (en) | 2019-06-04 |
Family
ID=56580658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610064309.6A Active CN105843699B (en) | 2015-02-02 | 2016-01-29 | Dynamic random access memory device and method for error monitoring and correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105843699B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10248484B2 (en) * | 2017-02-21 | 2019-04-02 | Intel Corporation | Prioritized error-detection and scheduling |
US11080135B2 (en) | 2017-06-27 | 2021-08-03 | Intel Corporation | Methods and apparatus to perform error detection and/or correction in a memory device |
KR20220103205A (en) | 2017-12-29 | 2022-07-21 | 마이크론 테크놀로지, 인크. | Uncorrectable ecc |
CN109062722A (en) * | 2018-07-24 | 2018-12-21 | 郑州云海信息技术有限公司 | A kind of memory error detection method and device |
KR102623234B1 (en) * | 2018-08-14 | 2024-01-11 | 삼성전자주식회사 | Storage device and operation method thereof |
CN110879761A (en) | 2018-09-05 | 2020-03-13 | 华为技术有限公司 | Hard disk fault processing method, array controller and hard disk |
CN111324291B (en) * | 2018-12-14 | 2024-02-20 | 兆易创新科技集团股份有限公司 | Memory device |
US11200105B2 (en) * | 2018-12-31 | 2021-12-14 | Micron Technology, Inc. | Normalization of detecting and reporting failures for a memory device |
KR20200121179A (en) * | 2019-04-15 | 2020-10-23 | 에스케이하이닉스 주식회사 | Semiconductor apparatus and semiconductor system including the same |
US10936209B2 (en) * | 2019-06-06 | 2021-03-02 | Micron Technology, Inc. | Memory error indicator for high-reliability applications |
US11321458B2 (en) * | 2020-01-28 | 2022-05-03 | Nuvoton Technology Corporation | Secure IC with soft security countermeasures |
US11507454B2 (en) * | 2020-10-26 | 2022-11-22 | Oracle International Corporation | Identifying non-correctable errors using error pattern analysis |
WO2023206346A1 (en) * | 2022-04-29 | 2023-11-02 | Nvidia Corporation | Detecting hardware faults in data processing pipelines |
US11972789B2 (en) | 2022-08-17 | 2024-04-30 | Changxin Memory Technologies, Inc. | Memory device with error per row counter (EpRC) performing error check and scrub (ECS) |
CN117636997A (en) * | 2022-08-17 | 2024-03-01 | 长鑫存储技术有限公司 | Counting circuit and memory |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4964130A (en) * | 1988-12-21 | 1990-10-16 | Bull Hn Information Systems Inc. | System for determining status of errors in a memory subsystem |
US6560725B1 (en) * | 1999-06-18 | 2003-05-06 | Madrone Solutions, Inc. | Method for apparatus for tracking errors in a memory system |
US20060109117A1 (en) * | 2004-11-22 | 2006-05-25 | International Business Machines Corporation | Apparatus and Method of Intelligent Multistage System Deactivation |
US8935592B2 (en) * | 2012-11-20 | 2015-01-13 | Arm Limited | Apparatus and method for correcting errors in data accessed from a memory device |
US20140215613A1 (en) * | 2013-01-25 | 2014-07-31 | International Business Machines Corporation | Attack resistant computer system |
-
2016
- 2016-01-29 CN CN201610064309.6A patent/CN105843699B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN105843699A (en) | 2016-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105843699B (en) | Dynamic random access memory device and method for error monitoring and correction | |
US10019312B2 (en) | Error monitoring of a memory device containing embedded error correction | |
US9940457B2 (en) | Detecting a cryogenic attack on a memory device with embedded error correction | |
US10725853B2 (en) | Systems and methods for memory failure prevention, management, and mitigation | |
US9886200B2 (en) | Concurrent upgrade and backup of non-volatile memory | |
US10025649B2 (en) | Data error detection in computing systems | |
US9317349B2 (en) | SAN vulnerability assessment tool | |
US9208024B2 (en) | Memory ECC with hard and soft error detection and management | |
CN103026342B (en) | method and system for verifying memory device integrity | |
CN105122213A (en) | Methods and apparatus for error detection and correction in data storage systems | |
KR101983651B1 (en) | Mram field disturb detection and recovery | |
US9996414B2 (en) | Auto-disabling DRAM error checking on threshold | |
US10095570B2 (en) | Programmable device, error storage system, and electronic system device | |
CN102135925A (en) | Method and device for detecting error check and correcting memory | |
CH717594A2 (en) | Systems and methods for the detection of behavioral anomalies in applications. | |
US10338999B2 (en) | Confirming memory marks indicating an error in computer memory | |
US9965346B2 (en) | Handling repaired memory array elements in a memory of a computer system | |
US9009548B2 (en) | Memory testing of three dimensional (3D) stacked memory | |
CN105005513A (en) | Detection and fault-tolerant device and method for cache multi-digit data upset errors | |
CN110737539A (en) | Die level error recovery scheme | |
US8595570B1 (en) | Bitline deletion | |
CN105511972A (en) | Method for reducing faults of independent redundant disk array | |
CN107315649A (en) | A kind of list item method of calibration and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |