Summary of the invention
The present invention be in order to solve under above-mentioned wide temperature difference condition, the prior art cannot preferably realize data reliable read write,
It can not carry out that the defect of self-correction carries out, and it is an object of the present invention to provide a kind of carry out correcting data error using temperature difference equalization methods and depositing
Storage device.
The present invention provides a kind of storage device that correcting data error is carried out using temperature difference equalization methods, has storage chip, control
Coremaking piece and the interface connecting with external data source, wherein being stored with control program in control chip, control program is realized
Following steps:
Storage chip is divided into multiple storage regions, the region includes multiple storage cells;
As data are written in storage cell, when some storage cell is fully written, it is corresponding just to record the moment
Temperature T1, and stored by being written as metadata, the corresponding write-in temperature T1 of each storage cell is obtained, and calculate entire storage
Mean temperature T when chip is by the multiple overall write-in being written when writing full;
After the completion of write-in, real time environment temperature T2 locating for real-time monitoring storage chip;
Whether the difference between mean temperature T when judging environment temperature T2 and write-in is more than threshold value t, if it exceeds into
In next step;
When difference is more than threshold value t, further judge whether environment temperature T2 reaches the ideal write-in temperature of setting, works as ring
When border temperature T2 reaches the ideal write-in temperature of setting, the write-in temperature of storage cell all in some storage region is just calculated
The summation of the difference of T1 and environment temperature T2, obtains the difference summation of all storage regions;
It is ranked up with the size of the difference summation of storage region, from big to small successively to the number in all storage regions
According to testing and error correction.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein the storage cell is memory block or memory page.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein include: to the process that data are tested
Compare in some storage region, the corresponding environment temperature of all storage cells and write-in temperature T1Between difference
And arranged according to numerical values recited, data scanning successively is carried out to storage cell from big to small, inspection obtains the bit error rate.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein correcting data error process includes:
When error rates of data reaches threshold value, but is less than the hardware ECC error correction ability of storage chip itself, start hardware
The data completely restored are written back in new storage cell by ECC error correction again, while recording write-in temperature T when write-in1;
When hardware ECC error correction ability of the error rates of data more than storage chip itself, start additional time data recovery mechanism
To restore error correction.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein set the hardware ECC error correction ability of storage chip itself as the bit error rate of 70-85%.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein the additional time data recovery mechanism includes that RAID data restores.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein the setting value of the ideal write-in temperature be according to the product specification of deposit data storage device, practical service environment,
What the such factor of flash type, life cycle was set.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, the corresponding write-in temperature T1 of all storage cells of some storage region are stored in specific storage region according to numerical values recited
In.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, the storage device are SSD solid state hard disk.
The storage device provided by the invention that correcting data error is carried out using temperature difference equalization methods, can also have such spy
Sign, wherein the storage chip of the SSD solid state hard disk is the storage chip of SLC, MLC, TLC or QLC flash memory particle.
The function and effect of the present invention is: carrying out correcting data error using temperature difference equalization methods involved according to the present invention
Storage device because because control chip in be stored with control program, which is first divided into storage chip multiple
Storage region, the region include multiple storage cells;It is written in storage cell then as data, when some storage is single
When position is fully written, the moment corresponding temperature T1 is just recorded, and stored by being written as metadata, obtain each storage cell pair
The write-in temperature T1 answered, and calculate mean temperature T of the entire storage chip by the multiple overall write-in being written when writing full when;
After the completion of to be written, real time environment temperature T2 locating for real-time monitoring storage chip;When real-time judge environment temperature T2 is with write-in
Mean temperature T between difference whether be more than threshold value t, if it exceeds enter in next step;Once when difference is more than threshold value t,
Further judge whether environment temperature T2 reaches the ideal write-in temperature of setting, when environment temperature T2 reaches the ideal write-in of setting
When temperature, the total of the write-in temperature T1 of storage cell all in some storage region and the difference of environment temperature T2 is just calculated
With obtain the difference summation of all storage regions;Finally be ranked up with the size of the difference summation of storage region, from greatly to
It is small successively to test to the data in all storage regions and error correction, so, the present invention can timely obtain write-in number
Temperature locating for storage chip after, only the environment temperature locating for the storage chip and totally write-in when mean temperature
When difference is more than the threshold value of setting, the data reconstruction method that can just trigger temperature difference equilibrium carries out correcting data error;And carrying out data
When error correction, when by waiting environment temperature to reach the best write-in temperature of setting, data detection and recovery operation is just carried out, is passed through
The summation for calculating the write-in temperature T1 of storage cell all in some storage region and the difference of the environment temperature, is owned
Storage region difference summation then be ranked up according to the size of difference summation, from big to small successively to all memory blocks
Data in domain carry out bit error rate inspection and data are restored, and so operation not only can be to the number for being possible to loss of data occur
According to progress error correction recovery, and correct data can be written in most preferably write-in temperature, play preferable Data Data error correction
Solidification effect.
Embodiment 1
Fig. 1 is that the structure of the storage device for carrying out correcting data error using temperature difference equalization methods in the embodiment of the present invention is shown
It is intended to.
As shown in Figure 1, carrying out the storage device 100 of correcting data error using temperature difference equalization methods, there is storage chip 10, control
Coremaking piece 20 and the interface 30 and pcb board 40, temperature sensor 50 being connect with external data source, wherein in control chip
It is stored with control program.
The storage chip 10 is storage chip made of SLC, MLC, TLC or QLC flash memory particle.
In the present embodiment, the storage chip is nand flash memory chip, specially SLC, MLC, TLC or QLC flash memory
The nand flash memory chip of grain production.Theoretically, be also possible to other kinds of storage chip, for example, NOR flash memory, ROM,
PROM, EPROM, EEPROM, Flash ROM, FRAM, MRAM, RRAM, PCRAM etc. are that can be used as storage core of the invention
Piece.
SLC, Single-LevelCell, i.e. 1bit/cell, the speed fast service life is long, price it is super it is expensive (about MLC3 times or more
Price), about 100,000 erasing and writing lifes.
MLC, Multi-LevelCell, i.e. 2bit/cell, speed General Life is general, and price is general, about 3000---
10000 erasing and writing lifes.
TLC, Trinary-LevelCell, i.e. 3bit/cell, Ye You Flash producer is 8LC, the relatively slow service life phase of speed
To short, cheap, about 500 erasing and writing lifes.
QLC, Quad-Level Cell, i.e. 4bit/cell support 16 charge values, and the speed most slow service life is most short.
The nand flash memory chip of these three structures, the briefly best performance of SLC, price superelevation.It is typically used as enterprise
Grade or high-end enthusiast.MLC performance is enough, and moderate cost is consumer level SSD application mainstream, and TLC comprehensive performance is minimum, and price is most
Cheaply.But the performance of TLC flash memory can be made up, improved by high-performance master control, master control algorithm.
Chip 20 is controlled, chip, commercially available purchase, such as SATA3 controller is controlled using general SSD, selects the U.S.
88SS1074,88SS1079 controller of Marvell company (Chinese name steps prestige science and technology Group Co., Ltd, happiness of now renaming),
It is applicable in SATA data-interface;
NVMe controller is selected U.S. Marvell company (Chinese name steps prestige science and technology Group Co., Ltd, happiness of now renaming)
88SS1093,88SS1092 controller, the PCIe data interface being suitable under NVMe agreement.
Here the Marvell company, the U.S. enumerated is an example, actually the SSD controller of any producer on the market
It can realize, not limiting is Marvell company, the U.S..
The interface 30 of data source connection, the interface used includes PCIe, SATA interface.
Pcb board 40, as the circuit carrier of hardware above, the storage chip 10, control chip 20 and interface 30 are all
It is arranged on the pcb board 40.The PCB is provided with temperature sensor 50, for detecting the temperature of the storage chip 40.
Fig. 2 be in the embodiment of the present invention control chip storage control program corresponding to utilization temperature difference equalization methods into
The step schematic diagram of the method for row correcting data error.
Control program is stored in control chip 20, control program realization is below to utilize temperature difference equalization methods to carry out data
Error correction, as shown in Fig. 2, using temperature difference equalization methods carry out correcting data error the following steps are included:
Storage chip is divided into multiple storage regions by step S1, and the region includes multiple storage cells.
In the present embodiment, the storage chip is nand flash memory chip, specially SLC, MLC, TLC or QLC flash memory
The nand flash memory chip of grain production.Theoretically, can also make other kinds of storage chip, for example, NOR flash memory, ROM,
PROM, EPROM, EEPROM, Flash ROM, FRAM, MRAM, RRAM, PCRAM etc. are that can be used as storage core of the invention
Piece.
SLC, Single-LevelCell, i.e. 1bit/cell, the speed fast service life is long, price it is super it is expensive (about MLC3 times or more
Price), about 100,000 erasing and writing lifes.
MLC, Multi-LevelCell, i.e. 2bit/cell, speed General Life is general, and price is general, about 3000---
10000 erasing and writing lifes.
TLC, Trinary-LevelCell, i.e. 3bit/cell, Ye You Flash producer is 8LC, the relatively slow service life phase of speed
To short, cheap, about 500 erasing and writing lifes.
QLC, Quad-Level Cell, i.e. 4bit/cell support 16 charge values, and the speed most slow service life is most short.
The nand flash memory chip of these three structures, the briefly best performance of SLC, price superelevation.It is typically used as enterprise
Grade or high-end enthusiast.MLC performance is enough, and moderate cost is consumer level SSD application mainstream, and TLC comprehensive performance is minimum, and price is most
Cheaply.But the performance of TLC flash memory can be made up, improved by high-performance master control, master control algorithm.
Wherein, the storage cell is memory block (block) or memory page (page), it is however generally that, one basic to deposit
The capacity for storing up unit is 16k byte, this specific data is different and different according to the manufacturer of storage particle.
Step S2, when some storage cell is fully written, just records the moment as data are written in storage cell
Corresponding temperature T1, and stored by being written as metadata, the corresponding write-in temperature T1 of each storage cell is obtained, and calculate whole
Mean temperature T when a storage chip is by the multiple overall write-in being written when writing full.
The corresponding write-in temperature T1 of all storage cells of each storage region is stored according to numerical values recited specifically to be deposited
In storage area domain.
The corresponding write-in temperature of all storage cells is subjected to arithmetic summation, is then just obtained divided by the number of write-in temperature
Here mean temperature T.
When data are written, all by control chip possessed by storage device (including control chip and storage chip) into
ECC protection is gone.
ECC is writing a Chinese character in simplified form for " Error Correcting Code ", and Chinese is " error checking and correction ".ECC is one
Kind can be realized the technology of " error checking and correction ", and ECC protection is exactly to apply this technology to protect the data of storage
Corresponding ECC code is written by control chip and is stored in storage chip generally in storing data by the operation of shield,
This will allow the data being stored in storage chip carry out hardware recovery.ECC can also be construed to error correction
or correcting code、error checking and correcting、error checking and
Correcting is also interpreted as Error correction circuit, is that a kind of maturation is applied and set in data storage
Standby upper data protection and Restoration Mechanism.
Step S3, after the completion of write-in, real time environment temperature T2 locating for real-time monitoring storage chip.
Because the write-in of data may be continuous, it is also possible to it is desultory, and writing process and reading in practice
Process and waiting process are interlaced, so write-in completion here is also possible to for some storage cell
The storage region artificially divided for some.
Real-time monitoring environment temperature T2Sample frequency be every 1-30 second once, this frequency is according to the storage chip institute
Depending on the working environment at place and the frequency of read-write, if the temperature change of working environment is bigger, and storage chip is read
It is frequent to write comparison, it is meant that in the case of this kind, the temperature change of storage chip can relatively acutely, and corresponding environment temperature is adopted
Sample frequency will be relatively high.
Step S4, whether the difference between mean temperature T when judging environment temperature T2 and write-in is more than threshold value t, if
More than into next step.
Specifically for each storage cell, when needing to calculate real-time environment temperature T2 in real time and being written
The temperature gap of mean temperature T, and judge whether the difference looked into the threshold value t of setting.
Obviously, according to the data being written in the storage chip introduced in background technique to the sensitive situations of temperature, write-in temperature
If the difference of degree and environment temperature reaches some value, that is, this threshold value t, then the risk of loss of data will be very big,
And this threshold value t is obviously and can all there be relationship in type, the technique of production and the manufacturer of storage chip.
For use environment, the setting value of this threshold value t is also influenced whether.Clearly as the significance level of data
Difference, for especially important valuable data, the threshold value t for the difference that we set is with regard to smaller, temperature variation small in this way
Can triggering following step S4 checked operation, so as to preferably protect the integrality of these valuable datas.
On the other hand, due to storage chip type, for example, SLC type nand flash memory particle just than the NAND of MLC type
Flash memory is particle stabilized, reliably, just has stronger resistivity to temperature variation, even if temperature change, the stabilization of data
Property also than MLC, TLC is higher, in this way can be in phase using the threshold value t of difference of the storage chip of SLC type flash memories particle
It is set in the case where larger.
Same reason, the technique of different production and manufacturer also result in the threshold of the temperature gap of storage chip
Value t is different.Applicant suggests that the threshold range used is 20-80oC.
Step S5 further judges whether environment temperature T2 reaches the ideal write-in temperature of setting when difference is more than threshold value t
Degree just calculates storage cell all in some storage region when environment temperature T2 reaches the ideal write-in temperature of setting
The summation that the difference of temperature T1 and environment temperature T2 is written, obtains the difference summation of all storage regions.
Wherein, the setting value of the ideal write-in temperature is product specification according to deposit data storage device, actual use
What the such factor of environment, flash type, life cycle was set.When environment temperature reaches the ideal write-in temperature of setting,
The data of write-in have better reliability, stronger, more difficult to make a fault or lose, in later Conservation environment
With to the better adaptability of extraneous temperature change.
When environment temperature T2 reaches the ideal write-in temperature of setting, storage list all in each storage region is just calculated
The summation of the difference of the write-in temperature T1 and environment temperature of position, obtains the difference summation of all storage regions.
Step S6 is ranked up, from big to small successively to all memory blocks with the size of the difference summation of storage region
Data in domain carry out bit error rate checking procedure and data restoration step.
Wherein, include: to the process steps of data progress bit error rate inspection
Compare in some storage region, the difference between the corresponding environment temperature of all storage cells and write-in temperature T1
And arranged according to numerical values recited, data scanning successively is carried out to storage cell from big to small, inspection obtains the bit error rate.
Judge whether the bit error rate of data reaches bit error rate threshold, data recovery is carried out if reaching bit error rate threshold.
According to the bit error rate and threshold value, the hardware ECC error correction ability of storage chip of the data that some storage cell is stored
The size relation of the attainable bit error rate carry out corresponding processing.
Hardware ECC protection mechanism can be reported during correcting data error every cell data (usually 512byte-4KB it
Between) an opposite percentage is arranged further according to the error correcting capability of hardware ECC in bit (Bit) quantity of data that can correct
Threshold value, the i.e. peak of the bit error rate, as the attainable bit error rate of hardware ECC error correction ability institute of storage chip, this error code
Rate is known as the error correcting capability of hardware ECC.The present embodiment can be according to the life cycle of storage equipment, flash memory characteristics, by hardware ECC's
Error correcting capability is set in the bit error rate in 70-85%.
Wherein, data recovery procedure step includes:
When error rates of data is not up to threshold value, illustrate that data are reliably, there is no loss situation occurs, without carrying out
Processing.This threshold value is known as data reliable thresholds, is usually determined by the control chip of storage device, and numerical value is from 10-5To 10-9Differ.
When error rates of data reaches threshold value, but is less than the hardware ECC error correction ability of storage chip itself, start hardware
The data completely restored are written back in new storage cell by ECC error correction again, while recording write-in temperature when write-in again
T1。
When error rates of data is more than the hardware ECC error correction ability of storage chip itself, starts additional data and restore machine
System is to restore data.Wherein, the additional time data recovery mechanism includes RAID data recovery, re-try, soft-retry number
According to recovery etc..
The full name of RAID is Redundant Array of Inexpensive Disk, and translator of Chinese is cheap redundant magnetic
The abbreviation abbreviation RAID technique of disk array.It is the DavidPatterson by the branch school California, USA university Berkeley in 1988
The disk redundancy technology that professor et al. puts forward.From that time, disk array technology develops quickly, and gradually moves to maturity.
People have gradually recognized disk array technology at present.Disk array technology can be divided into several ranks in detail
0-5RAID technology, and developed the new rank of so-called RAID Level 10,30,50 again.It is simple with the benefit of RAID
Say and be exactly: highly-safe, speed is fast, data capacity super large.The RAID technique of certain ranks can be increased to speed individually
The 400% of hard disk drive.Disk array links together multiple hard disk drives collaborative work, substantially increases speed,
The reliability of hard-disk system is increased to close to error-free boundary simultaneously.These " fault-tolerant " system speeds are exceedingly fast, while reliability
It is high.
RAID restoration methods are can to save entire Die using multiple die (disk sheet or storage cell) even-odd check
(disk sheet storage cell) scrap or loss of data, to restore to data.
Several data point voltage's distribiutings of re-try, i.e. read retry, MLC or TLC, SLC are possible to translate, as long as
Several distributions can restore without superposition.ReadRetry attempts to read data with different reference voltages, until reading
Come.
Soft-retry is read with Soft Inform ation.It integrates after reading several groups of data from different reference voltages and is finally counted
According to.This needs more powerful ECC error correction ability, such as LDPC (the English contracting of Low Density Parity Check Code
It writes, Chinese means low density parity check code, is most mentioned in his doctoral thesis early in the 1960s by Gallager
Out.) as data can not be restored by any mechanism, corresponding data are labeled as damaging, such as read the data of this storage cell,
Error condition will be returned, shows that the area data is unreadable.
The action and effect of the present embodiment is: the side of realization data reliability read-write according to involved in the present embodiment
Method, because when data are written in storage chip, record write-in temperature T1;And after the completion of write-in, real-time monitoring storage chip institute
The environment temperature T at place2;After collecting environment temperature, environment temperature T is judged2With write-in temperature T1Between difference whether be more than
Threshold value t needs to test if it exceeds illustrating that biggish change may occur for the data saved under the environment;Work as difference
When more than threshold value t, tests to the data of storage and obtain the bit error rate of data;According to not sympathizing with for the bit error rate of data
Shape timely carries out data recovery using different repair modes, so, method operation provided by the invention is in the storage device
Temperature locating for storage chip after write-in data can timely be obtained, and be compared in real time with temperature when write-in, when
When the threshold value of temperature and environment temperature more than setting is written, just illustrates that the data stored after write-in may have occurred loss, go forward side by side
One stepping performing check knows specific error rates of data, selected according to the bit error rate size of data different repair modes into
Row data reparation.
Further, because of record write-in temperature T1Process be that storage chip is first divided into multiple storage regions, it is described
Region includes multiple storage cells;It is written in storage cell then as data, when storage cell is fully written, just records
The moment corresponding temperature T1, and stored by being written as metadata, obtain the corresponding write-in temperature T of each storage cell1, institute
A write-in temperature can be corresponding with using each most basic storage cell and as metadata, it so can be to each basic
Storage cell be monitored examine and restored when there is loss of data.
Further, there are two types of operation modes for the process tested to the data of storage:
Mode one includes following operation:
Compare the corresponding environment temperature T of all storage cells2With write-in temperature T1Between difference and according to numerical values recited
It is arranged, preferentially the storage cell big to difference data carries out data scanning reading, and scanning show that some storage is single after reading
The bit error rate of position institute storing data.
Obviously operation is suitable for that storage chip is busy in this, that is, carrying out data reading or write operation when
It waits, since in this case, scan full hard disk reading can not be carried out.Under such situation, the present embodiment is arranged according to size of the difference
Sequence, priority processing difference big storage cell are tested, and are equivalent to are classified in battlefield to injury in this way, preferential to locate
Reason treatment severely injured personnel, the present embodiment so operates can be when storage chip be busy, the data detection and recovery effect that are optimal
Fruit.
Further, the present embodiment stores the difference of the bit error rate of different data according to some storage cell
Data the bit error rate and threshold value, the hardware ECC error correction ability of storage chip institute the attainable bit error rate size relation progress
Corresponding processing:
When error rates of data is not up to threshold value, illustrate that data are reliably, there is no loss situation occurs, without carrying out
Processing;
When error rates of data reaches threshold value, but is less than the hardware ECC error correction ability of storage chip itself, start hardware
The data completely restored are written back in new storage cell by ECC error correction again, while recording write-in temperature when write-in again
T1;
When error rates of data is more than the hardware ECC error correction ability of storage chip itself, starts additional data and restore machine
System is to restore data.Wherein, the additional time data recovery mechanism includes RAID data recovery, re-try, soft-retry number
According to recovery etc.;
If data can not be restored by any mechanism, corresponding data are labeled as damaging, as read this storage cell
Data will return to error condition, show that the area data is unreadable.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Above embodiment is preferred case of the invention, the protection scope being not intended to limit the invention.