CN105843699B

CN105843699B - Dynamic random access memory device and method for error monitoring and correction

Info

Publication number: CN105843699B
Application number: CN201610064309.6A
Authority: CN
Inventors: M.B.希利; H.C.亨特; C.A.基尔默; 金圭贤; W.E.莫尔
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2015-02-02
Filing date: 2016-01-29
Publication date: 2019-06-04
Anticipated expiration: 2036-01-29
Also published as: CN105843699A

Abstract

Embodiment of the disclosure provides a kind of method for monitoring the integrity of dynamic random access memory (DRAM) equipment for being embedded in error-correcting code (ECC) and predicting its failure.On the dram device the insertion of additional register, to store the information in relation to DRAM, for example, the number of equipment soft error detected and position.When DRAM device detects soft error, the information in adjunct register will be updated storage.Controller is compared the information being stored in adjunct register with dependent thresholds.In certain embodiments, after the information is compared with dependent thresholds, controller can decide whether scheduling repair action.In some other embodiment, controller can decide whether warning Memory Controller: DRAM may break down.

Description

Dynamic random access memory device and method for error monitoring and correction

Technical field

Generally speaking, this disclosure relates to computing hardware field, more particularly, this disclosure relates to error correction is embedded in Dynamic random access memory (DRAM) equipment of code (ECC) and the register of record and patch memory mistake are coupled, with And for low temperature attack monitoring DRAM device.

Background technique

Dynamic storage unit charge storage in the capacitor.The size of these capacitors constantly reduces, to adapt to not Break increased storage requirement.As capacitor becomes smaller and smaller, dynamic storage unit become increasingly easy receive because The influence of single unit soft error caused by the interference and background radiation of unit retention time of reduction, electricity or magnetic.For Control the increase of soft error, certain DRAM manufacturers are just directly embedded in error-correcting code (ECC) on the dram device, Er Feiyi Rely in central processing unit (CPU) or system memory controller.

As time goes by, DRAM capacitor can lose their charge, it is therefore necessary to refresh to them, to keep away Exempt to lose information.Many DRAM devices have the maximum refresh interval of Millisecond.DRAM capacitor loses the speed of its charge Rate is likely to be dependent on temperature.If DRAM capacitor is cooled down suddenly, charge, which may retain, to be longer than them and is in normal operating Time when temperature, possible last for several minutes to a few hours, rather than common several seconds.

Summary of the invention

Embodiment of the invention discloses one kind for monitoring that the dynamic randon access for being embedded in error-correcting code (ECC) is deposited The integrity of reservoir (DRAM) equipment and the apparatus and method for predicting its failure.In one embodiment, the disclosure includes one kind It is embedded in the DRAM device of ECC.The DRAM device further includes the continuous counter of the number of a storage mistake detected The register group of the storage address of register and a storage mistake detected.DRAM device further includes an ECC control Device processed, wherein the controller is configured so that ECC executes error checking and correction (EDAC).

In another embodiment, the disclosure includes a kind of soft in the DRAM device for be embedded in ECC for recording and correcting The method of mistake.The DRAM device carries out ECC inspection to a word, to judge in the word with the presence or absence of any soft error.When When detecting mistake, the error count being incremented by the register of storage on the dram device, and corresponding to the mistake The storage address of position retains in register group on the dram device.

In further embodiments, the disclosure includes a kind of method for predicting failure in DRAM.DRAM device receives One group of memorizer information in relation to DRAM.The DRAM device processing storage stack information, to determine one group of error indicator. Then DRAM device is compared one group of error indicator to relevant threshold value, if in one group of error indicator At least one of be more than its relevant threshold value, then to Memory Controller send alert.

In further embodiments, the disclosure includes a kind of for detecting dynamic random access memory (DRAM) equipment Low temperature attack, and the method that responds is attacked to the low temperature.One group of storage is handled by using one group of decision parameters Device information determines one group of error indicator.Then error indicator is compared with attack signature group.If error indicator It is matched with attack signature faciation, then forbids the access to DRAM device.

More embodiments of the disclosure relate generally to the Stateful Inspection DRAM device for instruction low temperature attack, and right The system and computer program product that the low temperature attack responds.

The above summary of the invention is not intended to each realization for describing each embodiment described or the disclosure.

Detailed description of the invention

The figure being included in the application is incorporated to this specification, and forms a part of this specification.They are illustrated The embodiment of the present invention, and with this description together principle for explaining the present invention.These figures illustrate only allusion quotation of the invention The embodiment of type, is not construed as limiting the invention.

Fig. 1 illustrates the high-level block diagram according to an example computer system of the embodiment of the present disclosure, the example meters Calculation machine system can be used for realizing one or more methods, tool, module and any relevant function described herein.

Fig. 2 is flow chart, illustrates and is deposited according to the dynamic random for being embedded in ECC function that is used to record of the embodiment of the present disclosure The method of mistake in access to memory (DRAM) equipment.

Fig. 3 is flow chart, is illustrated according to the embodiment of the present disclosure for identifying in the DRAM device for being embedded in ECC function In take repair action demand method.

Fig. 4 is to be embedded in ECC function, error logging unit and fault detection unit according to the embodiment of the present disclosure The structure chart of example DRAM device.

Fig. 5 is according to the embodiment of the present disclosure for predicting to be embedded in the method for failure in the DRAM device of ECC function Flow chart.

Fig. 6 is to be embedded in ECC function, error logging unit and low temperature attack detecting list according to the embodiment of the present disclosure The structure chart of the example DRAM device of member.

Fig. 7 is to illustrate attacking for detecting low temperature in the DRAM device for being embedded in ECC function according to the embodiment of the present disclosure The flow chart for the method hit.

Specific embodiment

Generally speaking, this disclosure relates to computing hardware field, more particularly, this disclosure relates to error correction is embedded in Dynamic random access memory (DRAM) equipment of code (ECC) repairs the register phase coupling of the demand of memory with record and mark It closes, and for low temperature attack monitoring DRAM device.However the disclosure is not limited to such application, it can be according to this explanation Described in book, various aspects of the disclosure is understood by the discussion to different instances.

DRAM is an independent electrical in a kind of memory cell of each bit storage data in integrated circuits Random access memory in container.Can charge to capacitor, its electric discharge can also be enabled, indicate the two of data bit into Numerical value (1 or 0) processed.Sometimes, a bit spontaneously can be turned to opposite binary value from a binary value, thus Soft error is caused.Electricity or magnetic disturbance hit the alpha particle an of unit and background radiation may cause soft error Accidentally.

The consequence of soft error may depend on system in memory.In the system without ECC, soft error be may cause not Noticeable consequence, it is also possible to lead to system crash or data corruption.For example, it is assumed that by ASCII fromat storage number Spreadsheet is loaded into the memory of application, and digital " 8 " are then inputted a data cell, then save electronic data Table." 8 " can be indicated by binary bit sequence 00111000, wherein each of sequence bit storage is in independence Memory cell in.If alpha particle hits the minimum of storage binary bit sequence before saving spreadsheet Significant bit (most right), causing bit to overturn from 0 is 1, then when next spreadsheet is reloaded in memory, Data cell previously including digital " 8 " may include digital " 9 " now.The variation of even now does not always cause system unstable Fixed but scientific for operation with the system of the financial application calculated and for file server, influence is unacceptable 's.

Those systems that can't stand data corruption can be used ECC memory and correct occurred mistake.ECC memory Additional memory chip can be used, to allow to add check bit.It is deposited when being accessed during reading and writing or refresh operation When storage unit, Memory Controller, or, recently, central processing unit (CPU) can be used together ECC with check bit, with inspection Debugging misses.If an error is found, then Memory Controller or CPU, can correct mistake, depending on the number of the bit overturn Mesh and used ECC.The example of ECC for correcting soft error includes Hamming code and Reed-Solomon code.

Cold boot attack is a kind of attack of wing passage, wherein after restarting machine using cold restart, is had pair The attacker that computer carries out physical access can retrieve encryption key from the operating system being currently running.After power is turned off, institute The data remanent magnetism that attack is stated dependent on DRAM retrieves the readable memory content to exist.In cold boot attack, It is not allowing operating system to execute its shutoff operation and power supply is closed in memory content dump in the case where file.Low temperature Attack is a kind of cold boot attack, wherein DRAM cooling first, to slow down the capacitor leakage in each DRAM cell.By subtracting Memory leakage in slow DRAM device, attacker can successfully steal to the more DRAM information of file dumping to improve A possibility that taking encryption key.

As used herein, " memorizer information " is that can be used for predicting failure in DRAM, or can be used for detecting and deposit Any information in relation to DRAM that low temperature is attacked on reservoir.For example, memorizer information may include error count, the temperature of DRAM Degree or in the case where no write operation the number of sequence read operation counting." word " is used in the design of specific processor One natural units of data, it is related with the size of bus transmission.Certain modern computers and server use 64 bits Word, but there is also the disclosure should not be limited to any specific word size to other word sizes.

" error rate " refers to the rate that new mistake occurs in DRAM.For example, if going out in 3 seconds time interval Existing 15 new mistakes, then error rate is 5 new mistakes per second." mistake acceleration " is the variation of a period of time error rate.For example, If error rate becomes 10 new mistake/seconds from 5 new mistake/seconds in 1 second time interval, wrong acceleration is per second Square 5 new mistakes." error indicator " is that can be compared with established threshold value, to judge whether to occur DRAM event Barrier, or whether there is any information for the related DRAM that the low temperature on DRAM is attacked.For example, error indicator can be mistake Counting, error rate, mistake acceleration or DRAM temperature." dependent thresholds " are the threshold corresponding to a certain given error indicator Value.For example, the maximum number for the mistake that the dependent thresholds of error count can be tolerated for DRAM, and the related threshold of error rate Value can be the largest tolerable rate of mistake new in DRAM.

" repair action " includes any movement executing on DRAM, repairing or prevent soft error.For example, certain In embodiment, repair action can be run memory erasing operation.In some other embodiment, especially when memory list Repair action can back up or mark the hardware when first perhaps row has had many wrong, so that computer system will Information no longer is stored in affected memory cell perhaps row or finally replaces DRAM.

Returning now to attached drawing, Fig. 1 is that such as method described herein, tool, module, Yi Jiren wherein may be implemented The illustrative embodiments (for example, using one or more processor circuits or the computer processor of computer) of what correlation function Example computer system (for example, server) 101 high-level block diagram.In certain embodiments, the master of computer system 101 Wanting component may include that one or more CPU 102, Memory Controller 105, memory 104, terminal interface 113, reservoir connect Mouth 114, input/output (1/O) equipment interface 116 and network interface 118, can be via memory bus 103, I/O bus 112 and I/O Bus Interface Unit 111 is directly or indirectly communicably coupled all of which, to carry out component Between communication.

Computer system 101 may include one or more general programmable central processing unit (CPU) 102A, 102B, 102C, And 102D, herein, generally referred to as CPU 102.In certain embodiments, computer system 101 may include more A processor is a kind of typical sizable system.In some other embodiment, computer system 101 can be one Single cpu system.Each CPU 102 can execute the instruction being stored in memory 104, and may include one layer or more veneer Cache memory (not shown).

Memory 104 may include the computer system readable media in volatile memory form, for example, dynamic random is deposited Access to memory (DRAM) 106.Computer system 101 can also include detachable/non-dismountable, volatile/non-volatile computer system System storage medium.Be only for example, storage system can be provided, with from non-dismountable, non-volatile magnetic medium (for example, " hard-drive Device ") it reads and is written to them.Although being not shown, disc driver can also be provided, with from detachable, nonvolatile magnetic disk (for example, floppy disk) reads and is written to them；Or CD drive, with from detachable, nonvolatile optical disk (for example, CD- ROM, DVD-ROM or other optical mediums) it reads or is written to them.In addition, memory 104 can also include such as flash memory The flash memory of stick or flash drive.Memory devices can be connected to by one or more data media interfaces and be deposited Memory bus 103.Memory 104 may include that there is one group (at least one) its configuration to be intended to execute each embodiment at least one Function program module program product.

Memory 104 can also include ECC check bit 107, ECC controller 108 and error logging unit 110.It is wrong Accidentally recording unit may include wrong address register (EAR) group 110A and error count register (ECR) 110B.EAR group 110A can be one group of register, wherein the position of each register storage mistake detected of ECC controller 108.For example, EAR group 110A can store row address, column address or the row that wherein ECC controller 108 detects the memory cell of mistake With column address.ECR 110B can be the register of wherein storage error count.Error count is found by ECC controller 108 Multiple mistakes Continuous plus.ECR 110B and EAR group 110A can periodically be reset.Alternatively, exist It, can also be by Memory Controller 105, ECC controller 108 or user ECR 110B and EAR group in some embodiments 110A is arranged again as a part of repair action.

ECC controller 108 can be configured so that check bit 107 and error-correcting code (for example, Hamming code and Reed-Solomon code) execute DRAM 106 on forward error correction (FEC).ECC check bit 107 stored in memory Number can depend on the size and used error-correcting code of DRAM 106.ECC controller 108 can also be configured to work as it It is incremented by ECR 110B when detecting soft error, and the information (for example, row and column address of mistake) in relation to mistake is stored in In EAR group 110A.

In certain embodiments, error logging unit 110 may include more storing posting for related DRAM more information Storage and register group.For example, in certain embodiments, error logging unit 110 can store a multi-bit errors and count, It is all when ECC controller 108 detects a multi-bit errors its be incremented by.In certain embodiments, error logging unit 110 can By store calculate the number of mistake found in specific memory group in DRAM 106 specifically for the mistake specifically organized in terms of Number.In addition, all when finding uncorrectable error in DRAM 106, error logging unit 110 can also be stored to storage Device controller 105 sends the unrecoverable error mark of warning.

In certain embodiments, as described above, ECC controller 108 can with error logging unit 110 directly into Row communication, for example, for the error count being incremented by ECR 110B.However, in some other embodiment, ECC controller 108 It can be communicated indirectly with error logging unit 110.For example, when errors are detected, ECC controller 108 can be to depositing Memory controller 105 alerts the position of mistake.Then, Memory Controller 105 can be communicated with error logging unit, with It is incremented by ECR 110B, and wrong address is stored in EAR group 110A.

In some embodiments it is possible to which there are multiple Memory Controllers.For example, CPU can have integrated memory Controller is designed to interact with External memory equipment.In certain embodiments, external storage controller may include ECC controller.

Memory 104 may include unshowned additional chip, sensor or controller.For example, memory 104 It may include temperature sensor, fault detection unit or low temperature attack detecting unit.Fault detection unit can be predicted to store The beginning of catastrophic failure in device 104, and send and alert to controller.Low temperature attack detecting unit can monitor DRAM 106 Whether there is the just experience low temperature attack of memory 104, i.e., a kind of sign of cold boot attack.Temperature sensor can monitor DRAM Operation temperature, and assisted cryogenic attack detecting unit.Fault detection unit will be discussed more fully referring to Fig. 4 and Fig. 5, with And low temperature attack detecting unit is discussed more fully referring to figure 6 and figure 7.

Although memory bus 103 is described as in CPU 102, memory 104 and I/O bus interface 111 in Fig. 1 Between provide the unibus structure of direct communication path, but memory bus 103 also may include more in certain embodiments The different bus of item or communication path, can be by any different form (for example, the point-to-point in hierarchical structure links, star Type or net type configuration, multilayered structure bus, parallel or redundant path or any other appropriate type configuration) it sets Set these different buses or communication path.Moreover, although I/O bus interface 111 and I/O bus 112 are shown as individually Unit independent, but in certain embodiments computer system 101 may include multiple I/O Bus Interface Units 111, A plurality of I/O bus 112 or both include multiple I/O Bus Interface Units 111 or including a plurality of I/O bus 112.In addition, although It shows multiple I/O interface units (they are mutually separated I/O bus 112 with each communication path for extending to each I/O equipment), but In some other embodiment, certain or whole I/O equipment can be directly connected in one or more system I/O bus.

In certain embodiments, computer system 101 can be multi-user's mainframe computer systems, single user system, Either server computer or have seldom or without end user's interface, and from other computer systems (client computer) receive The similar devices of request.In addition, in certain embodiments, computer system 101 can also be embodied as a desktop computer, Portable computer, laptop or notebook computer, tablet computer, pocket computer, phone, smart phone, network The electronic equipment of exchanger or router or other any appropriate types.

Note that Fig. 1 is intended to describe some representative main components an of illustrative computer system 101.So And in certain embodiments, each component may be more complicated than component shown in Fig. 1 or simple, it is understood that there may be different from Fig. 1 Component in addition to shown component or the component shown in Fig. 1, number, type and the configuration of such component can be different.

Referring now to Fig. 2, Fig. 2 shows be used to record the storage in relation to DRAM device according to an embodiment of the present disclosure A kind of flow chart of instance method 200 of device information.In some embodiments it is possible to pass through the ECC being embedded in memory 104 Controller 108 (shown in Fig. 1) executes method 200.It, can depositing by computer system 101 in some other embodiment Memory controller 105 executes method 200.The method may begin at operation 202, wherein executed in DRAM device reading, It writes or refresh operation.

As the reading and writing of the word at storage address or a part of refresh operation or connecting, at operation 204, ECC controller can be with the mistake of check word.ECC controller can be used existing error-correcting code or algorithm (for example, Hamming code and Reed-Solomon code) check word mistake.

At operation 206, ECC controller may determine that whether any mistake is detected in word during operation 204.Such as Fruit does not detect mistake, then the method terminates.However, being operated every time if detecting mistake at operation 204 ECC controller can be incremented by the error count being stored in ECR when 208, and the row address of errors present is stored in RAR group In available register in.

At operation 210, ECC controller can be compared error count with error thresholds.For example, error thresholds can Think the maximum number of tolerable mistake in DRAM.If error count is lower than the threshold value, the method terminates.So And if it is more than error thresholds that ECC controller, which determines error count, ECC controller will be arranged error flag, such as operate Described in 212.Error flag can be a piece of news, store in memory or be transmitted directly to memory control Device processed indicates that error count alreadys exceed the threshold value.It in some alternative embodiments, can be by DRAM device A pin be driven to it is high or low, be arranged error flag.

It shows according to an embodiment of the present disclosure referring now to Fig. 3, Fig. 3 for monitoring the one of DRAM device integrity The flow chart of kind instance method 300.In some embodiments it is possible to pass through the ECC controller 108 being embedded in memory 104 (shown in Fig. 1) executes method 300.It, can be by being integrated in the memory of computer system 101 in some other embodiment Controller 105 executes method 300.It, can be by the dedicated control that is embedded in DRAM device in other other embodiments Device or chip execute method 300.The method may begin at operation 302, wherein Memory Controller judges whether to be arranged Error flag.For example, Memory Controller can be judged whether by reading the message being stored in specific storage address Provided with error flag.If discovery indicates that error thresholds be exceeded predetermined disappears at the storage address Breath, then Memory Controller can be determined provided with error flag.

If Memory Controller, which determines, is provided with error flag, Memory Controller can be in each operation 304 Execute a repair action.If Memory Controller determines at operation 302 is not provided with error flag, memory control Device can be in each operation 306 from error logging unit search memory information.The memorizer information retrieved may include The error count being stored in ECR 110B and the erasure list being stored in EAR group 110A.Memory Controller can be with The new mistake of big figure is judged whether there is at operation 308.In order to judge whether there is the new mistake of big figure, memory control Device processed can be compared the total number of new mistake with a threshold value.

In certain embodiments, Memory Controller can be the number of mistake new in a certain specific register group and special It is compared for the threshold value specifically organized.In some other embodiment, Memory Controller can be at a certain specific address The number of the new mistake of (being included at specific row or column) is compared with corresponding address threshold.Under any circumstance, The threshold value as described in user configuration can also be arranged by memory manufacturer, and be stored in DRAM device In module in upper or nonvolatile storage.If Memory Controller determines the new mistake that big figure is not present, described Method will terminate.

When Memory Controller determines the new mistake there are big figure, Memory Controller can be in each operation 310 When dispatch a repair action.In certain embodiments, Memory Controller can determine there are big figure it is new mistake and After available reparation resource, it is immediately performed repair action.

It shows referring now to Fig. 4, Fig. 4 according to the failure being embedded in for predicting DRAM of an embodiment of the present disclosure The structure chart of the DRAM device of ECC.The DRAM device includes DRAM array 402, ECC controller 406, error logging unit 408 and fault detection unit 410, the I/O 404 by being connected to exterior I/O 412 directly or indirectly can by them Communicatedly it is coupled with computer system (not shown).I/O 404 is DRAM driver, and insertion is on the dram device, outside Portion I/O 412 provides voltage or current.Exterior I/O 412 can be the new additional pin on DRAM device, can also Be existing pin new multiplexing definition.

Error-correcting code (for example, Hamming code and Reed-Solomon code) detection can be used in ECC controller 406 Single-bit and multi-bit errors in DRAM array 402.According to used error-correcting code, ECC controller 406 can also school Mistake just detected, especially in the case where single-bit error.When ECC controller 406 detects mistake, storage Device information is stored in error logging unit 408.

Memorizer information may include: wrong address 408A, be the unit that wherein ECC controller 406 detects mistake Storage address；And error count 408B, it has been controlled by ECC since resetting for the last time from error logging unit 408 The number for the mistake that device 406 is found continuously is calculated.In certain embodiments, error logging unit 408 can store More memorizer informations.For example, error logging unit 408 can store corresponding to ECC controller more bits mistakes detected Second error count of number accidentally.In some other embodiment, ECC controller 406 be can store to ECC controller 406 What detected mistake was calculated in the unique group of each of DRAM array 402 counts specifically for the mistake specifically organized Number.In other other embodiments, the memorizer information being stored in error logging unit 408 may include uncorrectable Error count (number of uncorrectable error detected by ECC controller 406), the temperature of DRAM and error flag.

Fault detection unit 410 can store decision parameters 411.Decision parameters 411 are fault detection units 410 for true Determine the error rate of such as DRAM and the error indicator of wrong acceleration and is compared with error indicator to predict DRAM The parameter of the threshold value of middle failure.For example, decision parameters may include threshold value 411A, 411B and 411C and time cycle 411D And 411E.First threshold 411A can be largest tolerable error count, and second threshold 411B can be largest tolerable mistake Rate and third threshold value 411C can be largest tolerable mistake acceleration.Period first time, 411D can be used for calculating mistake Rate, and second time period 411E can be used for calculating wrong acceleration.

Fault detection unit 410 can monitor error count 408A on period first time 411D, to determine error rate. Then, fault detection unit can monitor error rate on second time period 411E, to determine wrong acceleration.Once failure Detection unit has determined that error count, error rate and wrong acceleration, they can be associated with threshold value 411A, 411B and 411C are compared respectively.If error count, error rate and wrong acceleration are more than that they are related Threshold value, then fault detection unit can alert Memory Controller: DRAM may break down or memory may become It is unstable.

In certain embodiments, fault detection unit may include the memory being stored in error logging unit 408 The controller that information and decision parameters 411 are compared.It, can be ECC controller 406 or outer in some other embodiment Portion's Memory Controller (for example, Memory Controller 105 shown in Fig. 1) is configured to a memorizer information and decision parameters 411 are compared.In these embodiments, fault detection unit 410 only can store decision parameters 411.

Fig. 4 describes the representative main component of an example DRAM device 401.However, in some embodiments In, each component may be more complicated than component shown in Fig. 4 or simple, it is understood that there may be different from component shown in Fig. 4 or remove Component except component shown in Fig. 4, number, type and the configuration of such component can be different.For example, in certain realities It applies in example, a single memory controller can be configured to execute the function of ECC controller 406 and fault detection unit 410.

In some embodiments it is possible to by user configuration decision parameters 411.In some other embodiment, Ke Yiyou DRAM manufacturer is loaded previously into decision parameters 411 in nonvolatile storage.In some embodiments it is possible to be executed in DRAM A certain given operation (for example, memory erasing operation) resets the information being stored in error logging unit 408, example later Such as, mistake address 408A and error count 408B.In some other embodiment, by fault detection unit 410 or it can deposit Memory controller 105 resets error logging unit 408.

It referring now to Fig. 5, Fig. 5 is shown according to an embodiment of the present disclosure for predicting the failure of DRAM device The flow chart of one instance method 500.In some embodiments it is possible to pass through the fault detection unit 410 being embedded in DRAM Execution method 500.In some other embodiment, it can be executed by the Memory Controller 105 for being integrated in computer system 101 Method 500.The method may begin at operation 502, wherein fault detection unit receives memory letter from error logging unit Breath.

At operation 504, fault detection unit can be determined by processing in the received memorizer information in 502 places of operation Error indicator.Error indicator can add for error count, error rate (new mistake appears in the rate in DRAM) and mistake Speed (variation of a period of time new error rate).In certain embodiments, error indicator also may include uncorrectable Error rate (uncorrectable error appears in the rate in DRAM) and uncorrectable error acceleration (for a period of time can not schools The variation of positive error rate).If the number of the bit overturn is more than insertion, error-correcting code on the dram device can The maximum number of correction is then considered as mistake uncorrectable.It is being determined that error indicator (is in this example, mistake Counting, error rate and wrong acceleration) after, fault detection unit may determine that whether any error indicator is more than it Dependent thresholds.

Firstly, fault detection unit may determine that whether error count is more than its dependent thresholds when operating 506 every time.Such as Fruit error count is more than its dependent thresholds, then fault detection unit can be to Memory Controller 105 (in Fig. 1 at operation 512 It is shown) warning is sent, then the process can terminate.If error count is not above its dependent thresholds at operation 506, Fault detection unit may determine that whether error rate is more than its dependent thresholds when then operating 508 every time.If error rate is more than it Dependent thresholds, then fault detection unit can send to Memory Controller 105 at operation 512 and alert, then the process It can terminate.If being not above its dependent thresholds in error rate, fault detection unit may determine that mistake when operation 510 every time Accidentally whether acceleration is more than its dependent thresholds.If wrong acceleration is more than its dependent thresholds, fault detection unit is being operated It can send and alert to Memory Controller 105 at 512, then the process can terminate.If do not had in wrong acceleration More than dependent thresholds, then the method can restart at operation 502, and wherein fault detection unit can be from error logging Unit receives memorizer information.

The warning at operation 512 to Memory Controller 105 can be realized in many ways.In certain embodiments, Sending warning to Memory Controller 105 may include the dedicated pin for being promoted or being reduced on DRAM device, for example, DRAM Parity error pin in equipment is driven to high or low.In other other embodiments, sends and warn to Memory Controller 105 Announcement may include data read-out by intentional destroy, to trigger cyclic redundancy check (CRC) mistake.In other other embodiments In, sending warning to Memory Controller 105 may include sending predefined data pattern to controller, and controller will be it It is identified as error signal.The disclosure does not require, and also should not be limited to for the warning when error indicator is more than its dependent thresholds Any specific method of Memory Controller.

It is shown referring now to Fig. 6, Fig. 6 and is able to detect and responds low temperature according to being embedded in for an embodiment of the present disclosure and attack The structure chart of the DRAM device of the ECC hit.The DRAM device includes DRAM array 402, ECC controller 406, temperature sensor 618, error logging unit 408, low temperature attack detecting unit 610 and fuse 612, by being connected to data pin 620 I/O 404 and order (CMD) decoder 614 for being connected to order (CMD) pin 616 are communicatively coupled to DRAM device to count Calculation machine system (not shown).

Data pin 620 is the data I/O pin of DRAM device.Data pin 620 is bi-directional pin, and having allows data The input capability write and the fan-out capability for allowing data to read.According to equipment, may exist 4,8,16 or 32 data pins 620.Other configurations are also possible, and the disclosure should not be limited to the equipment with any given number data pin 620.

CMD pin 616 is one group of input pin on DRAM device, and the element address that they provide and will access is more The order (for example, reading and writing, refreshing) of road multiplexing.CMD decoder 614 is the decoder on DRAM device, explains and provides Encoded order input on CMD pin 616, and DRAM is made to be able to carry out operation (for example, reading and writing, refreshing) appropriate.

DRAM array 402, ECC controller 406 and error logging unit 408 press the operation described above by reference to Fig. 4. Memorizer information is stored in mistake using the mistake in error-correcting code detection DRAM array 402 by ECC controller 406 In recording unit 408.In addition, error logging unit 408 also store its from temperature sensor 618 received DRAM temperature 608C。

Very alike with fault detection unit described herein, low temperature attack detecting unit 610 stores multiple decision ginsengs Number 611.Decision parameters 611 can be set by user configuration decision parameters 611 or by manufacturer, be attacked and examined with assisted cryogenic It surveys unit 610 and judges whether DRAM is undergoing low temperature attack.

For example, period first time 611C, which can be used, in low temperature attack detecting unit 610 calculates uncorrectable error rate, And uncorrectable error acceleration is calculated using second time period 611D.Low temperature attack detecting unit 610 may determine that Whether uncorrectable error rate and uncorrectable error acceleration are more than first threshold 611A and second threshold 611B respectively. Alternatively, in certain embodiments, low temperature attack detecting unit 610 can also be DRAM temperature 608C and temperature threshold 611E Be compared, with judge DRAM whether indicating low temperature attack temperature on operate.

If low temperature attack detecting unit 610 determines that DRAM just undergoes low temperature to attack, can forbid on DRAM The access of information.In some embodiments it is possible to realize this point by cutting fuse 612.For example, fuse 612 can be with For an electronic fuse (e- fuse), programmable resistance or phase-change resistor.

It shows according to an embodiment of the present disclosure referring now to Fig. 7, Fig. 7 for detecting and responding the low of DRAM device The flow chart of one instance method 700 of temperature attack.In some embodiments it is possible to be attacked by the low temperature for being embedded in memory 104 Hit 610 execution method 700 of detection unit.It, can be by being integrated in the memory of computer system 101 in some other embodiment Controller 105 executes method 700.It, can be the method includes in DRAM control logic in other other embodiments.

Firstly, method 700 may include determining one group by using one group of decision parameters processing storage stack information Error indicator.Then, low temperature attack detecting unit can be this group of error indicator and a predetermined attack signature Group is compared.The attack signature group can be when one group of DRAM device experience low temperature is attacked or DRAM device is electric The state being likely to occur when detection.For example, in certain embodiments, attack signature group can be high error rate, especially When mistake can not timing.In some other embodiment, attack signature group can be high wrong acceleration, equally, special It is not when mistake can not timing.In other other embodiments, attack signature group can be extremely low temperature.

In some embodiments of the disclosure, attack signature group can be stored on the dram device by DRAM manufacturer In nonvolatile storage.For example, attack signature group is also possible to be stored in as decision parameters in low temperature attack detecting unit Threshold value.It, can be by user configuration attack signature group in some other embodiment.In other other embodiments, Ke Yiyou DRAM manufacturer stores the first attack signature group in nonvolatile storage on the dram device, and user can also establish more More attack signature group, to meet its specific demand.

In certain embodiments, attack signature group can be one of high error rate, high wrong acceleration and/or low temperature Combination.For example, attack signature group character is turned to high unrecoverable error rate (in the first threshold in instance method 700 On value) and improve unrecoverable error acceleration (on second threshold), wherein if computer system is just being undergone just Normal shutdown operation, then the unrecoverable error acceleration improved are lower than the unrecoverable error acceleration of desired raising (that is, being lower than third threshold value).Method 700 may begin at operation 702, wherein low temperature attack detecting unit is from error logging list Member receives memorizer information.

At operation 704, low temperature attack detecting unit can determine uncorrectable error (UE) rate and acceleration.It can be with By calculating period specified time (for example, being stored in the first time period 611C in low temperature attack detecting unit 610) On the number of uncorrectable error that is found, determine UE rate.For example, if period first time 611C be 2 seconds, and 4 new uncorrectable errors are detected during 2 seconds, it is determined that UE rate is 2 uncorrectable errors per second.It can pass through The variation for calculating UE rate on a period of time (for example, second time period 611D), determines UE acceleration.For example, 1 second week time On phase, UE rate changes into 3 uncorrectable errors per second from 1 uncorrectable error per second, then can be UE acceleration It is determined as per second square of 2 uncorrectable errors.

At operation 706, low temperature attack detecting unit may determine that whether UE rate is more than first threshold.For example, can lead to It crosses and UE rate calculated is realized this point compared with the first threshold 611A being stored in low temperature attack detecting unit 611. If UE rate is not above first threshold, the method can restart at operation 702.If UE rate is more than the first threshold Value, then operation 708 every time, low temperature attack detecting unit may determine that whether UE acceleration is more than second threshold.

If UE acceleration is not above second threshold, the method can restart at operation 702.If UE Acceleration is more than second threshold, then low temperature attack detecting unit may determine that whether UE acceleration is more than third at operation 710 Threshold value.UE acceleration is compared with third threshold value, for example, it may be the influence and system of the low temperature attack on memory Restarting or the influence closed to memory are distinguished.If UE acceleration is more than that third threshold value (is indicating system Close or restarting), then it can restart the method at operation 702.If UE acceleration is not above Three threshold values, then low temperature attack detecting unit can forbid the access to information on DRAN at operation 712, and can terminate institute State process.

At operation 712, there is many and forbid the method to access to the information on DRAN.For example, in certain implementations In example, the permanent self-desttruction equipment resistance of the fuse 612 that such as fuses (cut off the power or voltage regulator exports) can be used Only to the access of information.In some other embodiment, when detecting low temperature attack, DRAM will fuse fuse 612, thus Make CMD decoder 614 that can not work.When making CMD decoder 614 that can not work, the order of all inputs will be ignored, and DRAM would not allow for reading its data.

In other other embodiments, entire array can be write as known state by DRAM.Offer row ground is provided The refresh address counter of location, cycle through all row addresses, execute one specifically write circulation (its make by force it is all read put Big device is in scheduled state) it realizes and the covering of all units is write.

As discussed in more detail herein, it should be appreciated that can be executed by optional order described herein Certain or all operationss of some embodiments of method, can also not execute；In addition, multiple operations can occur simultaneously Or an interior section as a larger process occurs.

The present invention can be a kind of system, method and/or computer program product.Computer program product may include meter Calculation machine readable storage medium storing program for executing, containing for making processor realize that the computer-readable program of various aspects of the invention refers to It enables.

Computer readable storage medium, which can be, can save and store the tangible of the instruction used by instruction execution equipment Equipment.For example, computer readable storage medium can be, but be not limited to electronic storage device, magnetic storage apparatus, optical storage are set Any suitable combination of standby, electric magnetic storage apparatus, semiconductor memory apparatus or above equipment.Computer readable storage medium A non exhaustive list of more specific examples include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding Equipment (for example, the structure of the punch card with recorded instruction or groove internal projection thereon) and above-mentioned storage medium Any suitable combination.Computer readable storage medium used herein above is not construed as instantaneous signal itself, for example, nothing The electromagnetic wave of line electric wave or other Free propagations, the electromagnetic wave propagated by waveguide or other transmission mediums are (for example, pass through The light pulse of fiber optic cables) or pass through electric wire transmit electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or by the network of such as internet, local area network, wide area network and/or wireless network download to outer computer or External memory equipment.The network may include copper transmission cable, optical transmission fiber, wireless transmission, router, firewall, friendship It changes planes, gateway computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment from Network receives computer-readable program instructions, and forwards the computer-readable program instructions, to be stored in each calculating/processing In computer readable storage medium in equipment.

Computer-readable program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA) Instruction, machine instruction, the instruction dependent on machine, microcode, firmware instructions, condition setup data or with one or more volume The source code or object code that any combination of Cheng Yuyan is write, the programming language include Smalltalk, C++ etc. The conventional procedure formula programming language of the programming language of object-oriented and such as " C " programming language or similar programming language. Computer-readable program instructions can execute fully on the user's computer, partly execute on the user's computer, It is executed as an independent software package, part partially executes on the remote computer on the user's computer or complete It executes on a remote computer or server.In the case where being related to remote computer, can by include local area network (LAN) or Any kind of network of person's wide area network (WAN) is connected to remote computer in the computer of user, alternatively, can connect in outer Portion's computer (such as using ISP, passing through internet).In certain embodiments, it may for example comprise programmable The electronic circuit of logic circuit, field programmable gate array (FPGA) or programmable logic array (PLA) can use calculating The status information of machine readable program instructions executes computer-readable program instructions, with individual electronic circuit, so that this can be realized The various aspects of invention.

Herein with reference to the flow chart of the method, apparatus (system) and computer program product of embodiment according to the present invention Illustrate and/or structure chart describes various aspects of the invention.It should be recognized that the flow chart illustrates and/or structure chart Each box and flow chart illustrate and/or structure chart in each box combination, can be subject to by computer-readable program instructions It realizes.

These computer program instructions can be provided in general purpose computer, special purpose computer or other programmable numbers According to the processor of processing unit, so that a kind of machine is produced, so that passing through computer or other programmable datas processing dress The instruction that the processor set executes can create for realizing function pointed in flow chart and/or each box of structure chart/move The mechanism of work.These computer program instructions can also be stored in computer-readable medium, these instructions can command meter Calculation machine, other programmable data processing units or other equipment operate in a specific way, to make to be stored in computer-readable It includes function action pointed in implementation flow chart and/or each box of structure chart that instruction in medium, which can generate one, The manufacture product of instruction.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or its In its equipment, so as to execute series of operation steps in computer, other programmable devices or other equipment, to produce A raw computer implemented process, then operates in computer, other programmable data processing units or other equipment Instruction can be realized in flow chart and/or each box of structure chart pointed function action.

Flow chart and block diagram illustrating in the attached drawing system of different embodiment according to the subject invention, method, Yi Jiji Architectural framework in the cards, function and the operation of calculation machine program product.In this regard, in flow chart or structure chart Each box can represent a module, program segment or a part of instruction, the module, program segment or a part of instruction packet One or more is included for realizing the executable instruction of specified logic function.In certain optional realizations, marked in box The function of note can also occur to be different from the order marked in attached drawing.For example, two boxes continuously described actually may be used To execute substantially in parallel, they can also be executed in the opposite order sometimes, depend on related function.It is also noted that , structure chart and/or flow chart illustrate in each box and structure chart and/or flow chart illustrate in box group Close, can be realized by the system based on specialized hardware for executing specified function or movement, can also by specialized hardware with The combination of computer instruction is realized.

Term used herein is only intended to description specific embodiment, it is no intended to limit to each embodiment.Such as As used herein, singular " one (English a or an) " and " should or described (English (the) " be also intended to Including plural form, unless context is clearly indicated.It should also be appreciated that when used as contemplated in this specification, term " including (English includes and/or including) " points out stated characteristic, entirety, step, operation, element and/or component Presence, but be not excluded for depositing for one or more other characteristics, entirety, step, operation, component, assembly unit and/or their combination Or addition.In the foregoing detailed description of each embodiment exemplary embodiment, attached drawing is had references to (wherein, with identical number Word indicates identical pel), the attached drawing forms a part of detailed description, and wherein, is illustratively shown wherein It can be with the specific exemplary embodiment of practicing various embodiments.These embodiments are described in detail, enough so as to make this Field technical staff practices these embodiments, however, also can be used in the case where not departing from the range of each embodiment other Embodiment, and logic, machinery, electricity and other changes can be carried out.In the above description, it elaborates many specific Details, it is desirable to provide one of each embodiment is comprehensively understood.However, it is also possible to the case where not using these details Lower practicing various embodiments.Take in example at some other, in order to keep embodiment clear, be not described in known circuit, structure, And technology.

The difference of word " embodiment " as used in this specification takes example that need not refer to the same embodiment, but can refer to same reality Apply example.Described herein or described data and data structure are only example, in some other embodiment, can be made With different data volumes, data type, field, the number of field and type, field name, the number of row and type, record, item The tissue of mesh or data.Alternatively, it is also possible to which any data are combined with logic, so as to not need independent data Structure.Therefore, above detailed description is non-limiting.

Although describing the present invention with regard to specific embodiment above, it is contemplated that those skilled in the art will obviously realize It arrives, the present invention can be changed and be modified.Accordingly, it is intended to which following following claims, which is construed to covering, falls into present inventive concept With all such changes and modification of range.

Claims

1. a kind of dynamic random access memory DRAM device for being embedded in error-correcting code ECC, the DRAM device include:

DRAM array；

Store the first register of error count；

Store first register group of one group of mistake address；And

ECC controller, wherein ECC controller is configured so that the ECC on DRAM array executes error checking and correction EDAC, all incremental errors when errors are detected count, and the available deposit in the first register group is written in wrong address Device,

Wherein the DRAM device further includes the temperature sensor for monitoring DRAM device temperature and the value of storage temperature threshold value Nonvolatile storage.

2. DRAM device according to claim 1 further includes the second register for storing multi-bit errors and counting.

3. DRAM device according to claim 1 further includes the second register for storing uncorrectable error mark.

4. DRAM device according to claim 1 further includes storing one group of mistake specifically for specific register group to count The second several register group, wherein described one group each of the error count specifically for specific register group corresponds to Unique memory group in DRAM array.

5. DRAM device according to claim 1, further includes:

Second register group stores multiple decision parameters；And

Fault detection unit, predicts the failure of DRAM device, and detects the failure row or column of DRAM device.

6. a kind of method for recording and correcting dynamic random access memory DRAM error, which comprises

Use the mistake in word in error-correcting code detection DRAM device；

The detection to mistake is responded, the error count being stored in the first register is incremented by；And

The detection to mistake is responded, available is posted what the wrong address corresponding to errors present was stored in the first register group In storage,

The wherein method further include:

Whether misjudgment is located in first memory group；And

It detects that mistake is responded in first memory group, is incremented by and is directed to the first error count specifically organized, wherein needle The error count specifically organized to first is stored in multiple mistakes in first memory group, and will be directed to the first mistake specifically organized Miscount is stored in the second register group.

7. according to the method described in claim 6, further include:

Whether misjudgment is multi-bit errors；And

It is that multi-bit errors are responded to mistake, is incremented by the multi-bit errors being stored in the second register and counts.

8. according to the method described in claim 6, further include:

Whether misjudgment is uncorrectable；And

Mistake is responded to be uncorrectable, uncorrectable error mark is set.

9. according to the method described in claim 6, further include:

Whether it is more than threshold value that misjudgment counts；And

It is more than that the threshold value is responded to error count, error flag is set.

10. executing repair action according to the method described in claim 9, further including being responded to set error flag.

11. according to the method described in claim 6, further include:

Determine the first error count in first time；

Determine the second error count in the second time, the second time is after the first time；

By the way that the second error count is compared with the first error count, the number of multiple new mistakes is determined；

Whether the number of the new mistake of judgement is greater than new error thresholds；And

To determining that the number of new mistake is more than that new error thresholds are responded, repair action is dispatched.

12. a kind of method for predicting failure in DRAM device, which comprises

Receive the memorizer information in relation to DRAM device；

Memorizer information is handled using one group of decision parameters, to determine error indicator；

Whether the indicator that judges incorrectly is more than relevant error threshold value；And

It is more than that relevant error threshold value is responded to error indicator, is given a warning to controller,

Wherein:

Error indicator includes error rate and wrong acceleration；And

It includes error rate being compared with first threshold, and handle is wrong that whether misjudgment indicator, which is more than relevant error threshold value, Accidentally acceleration is compared with second threshold.

13. according to the method for claim 12, wherein giving a warning to controller including promoting dedicated pin.

14. according to the method for claim 12, being predefined wherein being given a warning to controller including being sent to controller Reading data pattern.

15. a kind of method for detecting with low temperature attack on response dynamics random access memory DRAM device, the method Include:

Receive the storage stack information in relation to DRAM device；

The storage stack information is handled using one group of decision parameters, to determine one group of error indicator；

Judge whether one group of error indicator matches with attack signature faciation；And

One group of error indicator is matched with attack signature faciation and is responded, the access to DRAM device is forbidden, in which:

One group of error indicator includes error rate and wrong acceleration；And

Judge whether one group of error indicator matches with attack signature faciation including error rate and first threshold are compared Compared with, and a wrong acceleration is compared with second threshold.

16. according to the method for claim 15, wherein forbidding including on cutting DRAM device to the access of DRAM device Fuse.

17. according to the method for claim 15, wherein forbidding includes that DRAM array is written as to the access of DRAM device Know state.

18. according to the method for claim 15, wherein by user configuration attack signature group.

19. a kind of system for detecting with low temperature attack on response dynamics random access memory DRAM device, comprising:

Memory；And

Dynamic random access memory DRAM control logic, wherein DRAM control logic makes memory execute following method:

Receive the storage stack information in relation to DRAM device；

One group of error indicator includes error rate and wrong acceleration；And

20. a kind of computer-readable storage for detecting with low temperature attack on response dynamics random access memory DRAM device Medium, wherein storing program instruction in the computer readable storage medium, when executed, perform claim requires 15 ~18 it is one of any the step of.