CN112579342A - Memory error correction method, memory controller and electronic equipment - Google Patents
Memory error correction method, memory controller and electronic equipment Download PDFInfo
- Publication number
- CN112579342A CN112579342A CN202011461460.6A CN202011461460A CN112579342A CN 112579342 A CN112579342 A CN 112579342A CN 202011461460 A CN202011461460 A CN 202011461460A CN 112579342 A CN112579342 A CN 112579342A
- Authority
- CN
- China
- Prior art keywords
- memory
- error correction
- granule
- redundant
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 title claims abstract description 441
- 238000012937 correction Methods 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000008187 granular material Substances 0.000 claims abstract description 213
- 239000002245 particle Substances 0.000 claims abstract description 64
- 238000012545 processing Methods 0.000 claims abstract description 13
- 240000007320 Pinus strobus Species 0.000 claims 2
- 230000008569 process Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 7
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1044—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/073—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
- For Increasing The Reliability Of Semiconductor Memories (AREA)
Abstract
The application provides a memory error correction method, a memory controller and an electronic device, wherein the method comprises the following steps: uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; and reading the data of each memory grain in each channel, and replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC (error correction code) algorithm according to the read data, and performing useless processing on the wrong memory grain. Therefore, the redundant memory particles are adopted to replace the wrong memory particles for working, so that the normal operation of the channel can be ensured. Since each channel has redundant memory granule replacement, the error correction of multiple wrong memory granules can be realized in the implementation process.
Description
Technical Field
The present disclosure relates to the field of memory technologies, and in particular, to a memory error correction method, a memory controller, and an electronic device.
Background
In the field of computers, a memory is one of important components in a computer, and all programs in the computer are executed in the memory, and the memory is used for temporarily storing operation data in a Central Processing Unit (CPU) and data exchanged with an external storage such as a hard disk. As long as the computer is in operation, the CPU transfers the data to be operated to the memory for operation, and after the operation is finished, the CPU transmits the result. Therefore, the reliability of the data in the memory is crucial to the performance of the whole system, and directly influences the operation of the whole system.
Therefore, in the memory, an ECC (Error Correction Code) technology is generally adopted to implement memory Error Correction.
Although the ECC error correction technique has a certain error checking capability and error correction capability, the error correction capability is only suitable for the case that only one error memory granule exists in the memory, and if 2 or more error memory granules exist in the memory at present, correct data cannot be corrected by applying the ECC error correction procedure. That is, the error correction capability of the ECC error correction process is limited to the case that only one error memory granule exists in the memory, and the error correction capability is limited, so that the requirement for the reliability of the memory in practical application cannot be met.
Disclosure of Invention
An embodiment of the present invention provides a memory error correction method, a memory controller and an electronic device, so as to solve the above problems.
The embodiment of the application provides a memory error correction method, which comprises the following steps: uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC (error correction code) error correction algorithm; and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
In the implementation process, when the memory redundancy ratio is greater than or equal to 1 to 4, in the memory in the precision synchronous mode, each channel may have redundant memory granules in addition to the data granules used for data storage and the ECC granules used as ECC check codes. In the embodiment of the present application, when an erroneous memory granule is found, the redundant memory granule is used to replace the erroneous memory granule, and the erroneous memory granule is subjected to useless processing. Namely, in the implementation process, the error memory granules are replaced by the redundant memory granules to work, so that the normal operation of the channel can be ensured. Since each channel has redundant memory granules, the error correction of multiple wrong memory granules can be realized in the implementation process, so that the error correction capability of multiple wrong memory granules is realized, and the requirement on the reliability of the memory in practical application is further met.
Further, the method further comprises: and when the ECC error correction algorithm finds the wrong memory granules and redundant memory granules do not exist, correcting the error of the wrong memory granules by using the ECC error correction algorithm.
In the implementation manner, when an erroneous memory granule is found and no redundant memory granule exists (that is, the redundant memory granule is used up), the erroneous memory granule can be corrected by an ECC error correction algorithm, so that the error correction capability of the erroneous memory granule is further increased, the number of error-correctable memory granules is increased, and the requirement on the reliability of the memory in practical application can be better met.
Further, when an erroneous memory granule is found by the ECC error correction algorithm, before the redundant memory granule is used to replace the erroneous memory granule, the method further includes: determining that the ECC error correction algorithm has been used for error correction.
In the implementation process, when an erroneous memory granule is found, an ECC error correction algorithm is performed first. Then, when the wrong memory grains are found, the redundant memory grains are used for replacing the wrong memory grains, so that on the basis that one memory grain can be corrected by an ECC correction algorithm, the error correction of the newly found wrong memory grains is realized by combining the redundant memory grains, the number of the error-correctable memory grains is increased, and the requirement on the reliability of the memory in practical application can be better met.
Further, when an erroneous memory granule is found by the ECC error correction algorithm, replacing the erroneous memory granule with a redundant memory granule, including: and when the error memory grain is found through the ECC algorithm, replacing the error memory grain with a redundant memory grain in the channel where the error memory grain is located.
It should be appreciated that by replacing the faulty memory granule with a redundant memory granule within the channel in which the faulty memory granule resides, the replaced memory granule remains within the same channel as the normal memory granule, so that the channel identification logic of the data may not have to be altered.
Further, the replacing the erroneous memory granule with the redundant memory granule further includes: and when the channel where the error memory granule is located does not have the redundant memory granule, using the redundant memory granule in another channel except the channel where the error memory granule is located to replace the error memory granule in each channel for unified reading and writing.
In the implementation process, redundant memory granules are allowed to be adopted to replace the wrong memory granules in the cross-channel mode, so that the error correction can be carried out by fully utilizing the redundant memory granules, even if the wrong memory granules appear on a certain channel but the redundant memory granules do not exist, the wrong memory granules can be replaced by the redundant memory granules in other channels, and the adaptability to different error scenes is improved.
Further, the memory is a memory with a data bit width of 32+8 bits and a storage size of a single memory grain of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
For a memory with a Data bit width of 32+8 bits and a storage size of a single memory granule of 4 bits, such as a DDR5(Double Data Rate SDRAM 5, 5 th generation Double Data Rate synchronous dynamic random access memory) memory, a minimum Data processing unit of an ECC error correction algorithm is 128+16 bits, and then each channel has one redundant memory granule for error correction, so that the memory granule can have an error correction capability for 3 erroneous memory granules at most, thereby increasing the number of error-correctable memory granules and better satisfying the requirement for memory reliability in practical application compared with the existing mode of only using ECC error correction.
An embodiment of the present application further provides a memory controller, including: an ECC error correction circuit and a precise synchronization mode circuit; the accurate synchronization mode circuit is used for uniformly reading and writing a plurality of channels of the memory by using an accurate synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; the ECC correction circuit is used for reading each memory grain in each channel, replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC correction algorithm according to the read data, and performing useless processing on the wrong memory grain; wherein the read data meets the minimum data requirement of the ECC error correction algorithm.
The memory controller can utilize redundant memory granules in the channels to realize error correction of wrong memory granules, and each channel is provided with the redundant memory granules to replace the memory granules, so that the error correction of a plurality of wrong memory granules can be realized, the error correction capability of the plurality of wrong memory granules is realized, the memory reliability can be improved, and the requirement on the memory reliability in practical application is further met.
Further, the memory granule comprises a data granule and a redundant memory granule; the memory controller also comprises a gate which belongs to each channel; the gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the connection between the data particle interfaces and the redundant particle interface when the ECC correction circuit finds the erroneous data particles in the channel; wherein: the data particle interface is an interface used for accessing data particles in the memory controller; the redundant granule interface is an interface used for accessing redundant memory granules in the memory controller.
In the implementation process, on-off control of the memory controller to different memory particles in each channel in the memory can be realized through the gate corresponding to each channel, so that redundant memory particles can be used for replacing wrong memory particles. The circuit is simple to realize, does not need to greatly change the circuit structure of the existing memory controller, and has better universality.
Further, the gates are also respectively connected with redundant particle interfaces in the other channels except the channel.
In this way, the replacement capability of redundant memory granules across channels can be provided, so that even if an error memory granule occurs on one channel but no redundant memory granule exists, the error memory granule can be replaced by using the redundant memory granules in the rest channels, thereby improving the adaptability to different error scenes.
An embodiment of the present application further provides an electronic device, including any one of the foregoing memories and a memory controller; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; the memory controller is configured to execute any one of the above-mentioned memory error correction methods to implement error correction of the memory.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a memory error correction method according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating a basic structure of a memory controller according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a strobe connection structure of a memory controller according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of another strobe connection structure of a memory controller according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating an initial state of a DRR5 memory according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram illustrating a channel status when an error memory granule occurs on two channels of a DRR5 memory according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a channel status when 3 memory granules with errors occur on channel 0 of the DRR5 memory according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
an embodiment of the present application provides a memory error correction method, which can be seen in fig. 1, and includes:
s101: and uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode.
It should be noted that the so-called precision synchronization Mode, also called Lockstep Mode (Lockstep Channel Mode), is to use the same redundant hardware components to process the same instruction at the same time, so that the data on one CPU Cache line (Cache-line) is distributed to several memory channels.
After the precise synchronization mode is used, a plurality of channels of the memory can be used for storing data on the same CPU cache line, so that the memory granules among the channels have the sharing possibility.
S102: and reading the data of each memory particle in each channel.
It should be noted that, in the embodiment of the present application, the read data should meet the minimum data requirement of the ECC error correction algorithm, so that the ECC error correction algorithm operates normally, and can identify errors in the read data, thereby locating whether there are erroneous memory granules, and specifically which memory granule has an error.
S103: and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
It should be noted that, in the embodiment of the present application, a chipkill algorithm may be used to implement that a redundant memory granule is used to replace an erroneous memory granule.
Furthermore, in the embodiment of the present application, the ECC error correction algorithm may be implemented by, but not limited to, an RS (144, 128) algorithm.
It should be noted that, in the embodiment of the present application, the memory should be a memory with a redundancy ratio of 1 to 4 or more. It should be noted that the data bit width of the memory includes two parts, which are the bit width of the required data part and the bit width of the redundant data part. For example, for a conventional DDR4 memory, the data bit width is 64+8 bits, i.e., the data bit width is composed of 64-bit data bits + 8-bit redundancy bits. For another example, for a conventional DDR5 memory, the data bit width is 32+8 bits, that is, the data bit width is formed by 32-bit data bits + 8-bit redundancy bits.
The redundancy ratio is the ratio of the bit width of the redundant bit to the bit width of the data bit in the memory data bit width. For example, for DDR4 memory, the redundancy ratio is 1 to 8; for DDR5 memory, the redundancy ratio is 1 to 4.
It should be further noted that, since the minimum data requirement (hereinafter referred to as ECC word) of the ECC error correction algorithm is 128+16 bits, for DDR4 memory, the redundancy bits need to be all used for saving the check data of the ECC error correction algorithm, so that there is no redundant memory granule in the memory.
For a 1 to 4 redundancy ratio memory, such as a DDR5 memory, there is a redundant memory granule for each channel after storing the check data of the ECC error correction algorithm. Therefore, in the embodiment of the present application, the memory is a memory with a redundancy ratio of 1 to 4 or more.
It should be understood that, in the embodiment of the present application, when data of each memory granule in each channel is read, the read data should meet the requirement of the ECC Word, so that the ECC error correction algorithm can process the data.
For example, assuming that the ECC Word is 128+16 bits, and the memory is a DDR5 memory with a storage size of 4 bits and a single memory grain with two channels, two channels of the memory may be read twice, so as to obtain 128+32 bits of data, thereby satisfying the 128+16 bits requirement of the ECC Word. Each of the two channels now has one redundant memory granule, so that the memory will have two redundant memory granules.
Of course, for a DDR5 memory with a storage size of 4 bits with a single memory grain of 4 channels, the data 128+32 bits can be obtained by reading 4 channels only once, thereby meeting the 128+16bit requirement of ECC Word. Each of the 4 channels now has one redundant memory granule, so that a DDR5 memory will have 4 redundant memory granules.
It should be noted that, in the embodiment of the present application, the error correction capability of the ECC error correction algorithm itself for an erroneous memory granule may also be combined, so as to further improve the error correction capability for the memory.
For example, for a DDR5 memory with two channels and a single memory granule with a storage size of 4 bits, there is one redundant memory granule per channel after uniform reading and writing of the two channels using the precise synchronization mode. Then error correction for both memory grains can be achieved by replacing the erroneous memory grain with a redundant memory grain. In addition, by using the ECC error correction algorithm, an error correction of an erroneous memory granule can be additionally performed.
In a possible implementation manner of the embodiment of the present application, an ECC error correction algorithm may be first used to correct a first discovered erroneous memory granule, and then, for a later discovered erroneous memory granule, error correction is implemented by using a redundant memory granule instead of the erroneous memory granule.
In addition, in a possible implementation manner of the embodiment of the present application, after the erroneous memory granule is detected, the redundant memory granule may be used to replace the erroneous memory granule for error correction, and after the redundant memory granule is used, the ECC error correction algorithm may be used to correct the newly found erroneous memory granule.
It should be noted that, in this embodiment, it may be configured to only use the redundant memory granule in the channel where the erroneous memory granule is located to replace the erroneous memory granule, so as to achieve the effect of error correction by using the redundant memory granule in the channel.
At this time, after the redundant memory granules in the channel are used up (i.e. there are no redundant memory granules), if an erroneous memory granule is found in the channel and the ECC error correction algorithm is not yet used to correct a certain memory granule, the ECC error correction algorithm may be used to correct the erroneous memory granule. However, if the ECC error correction algorithm is used to correct a certain memory granule, even if there is a redundant memory granule in another channel, the error correction cannot be performed on the erroneous memory granule, and at this time, a memory failure may be reported, and an engineer may perform memory repair or replacement.
In addition, in the present embodiment, it may also be configured that redundant memory granules may be adopted across channels to replace the error memory granules. That is, when there is no redundant memory granule in the channel where the found erroneous memory granule is located, the redundant memory granule in another channel except the channel where the erroneous memory granule is located in each channel for uniform reading and writing can be used to replace the erroneous memory granule. At this time, all redundant memory granules can be fully utilized, so that the scheme can be suitable for various memory granule failures.
For example, as shown in FIG. 8, after the redundant memory granule in channel 0 is used, the redundant memory granule in channel 1 may be used to replace the erroneous memory granule.
It should be noted that, in the embodiment of the present application, when the redundant memory granule is allowed to be used to replace the erroneous memory granule across the channels, the redundant memory granule in the channel where the erroneous memory granule is located may be preferentially used to replace the erroneous memory granule in the manner described above, and when the redundant memory granule does not exist in the channel where the erroneous memory granule is located, the redundant memory granule in another channel may be used to replace the erroneous memory granule.
However, the order of using the redundant memory granules is not limited, that is, when there is a redundant memory granule in the channel where the faulty memory granule is located, the redundant memory granule in another channel may be used to replace the faulty memory granule.
It should be noted that, in the embodiment of the present application, after the redundant memory granule replaces the erroneous memory granule, the redundant memory granule in the memory is used to implement the function of the erroneous memory granule, that is, the redundant memory granule does not belong to the redundant memory granule any more. For the erroneous memory granule, since useless processing (such as marking the memory granule as faulty, unavailable, etc. in the memory controller) is performed, the memory granule is discarded in the memory, and the erroneous memory granule is not a redundant memory granule after being replaced. Thus, redundant memory particles are "consumables" in the context of embodiments of the present application, and the label "redundant memory particles" is automatically lost after use.
The embodiment of the present application further provides a memory controller, which may be as shown in fig. 2. The memory controller may include: an ECC error correction circuit and a precision synchronous mode circuit. Wherein:
the precision synchronization mode circuit can be used for uniformly reading and writing a plurality of channels of the memory by using the precision synchronization mode.
The ECC error correction circuit may be configured to read each memory granule in each channel, and when an erroneous memory granule is found by an ECC error correction algorithm according to the read data, replace the erroneous memory granule with a redundant memory granule, and perform useless processing on the erroneous memory granule.
It should be noted that, in the memory, the memory granules include a data granule for storing data, an ECC granule for storing check data of an ECC error correction algorithm, and a redundancy granule (i.e., a redundant memory granule) remaining except for the data granule and the ECC granule.
In the embodiment of the present application, in order to better implement the replacement of the erroneous memory granule by the redundant memory granule, it can be seen from fig. 3 that the memory controller further includes gates respectively belonging to the channels.
And each gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the data particle interfaces when the ECC correction circuit finds the erroneous data particles in the channel.
It should be understood that in most cases the memory and memory controller are not directly connected, but that no matter what circuits are present between the memory and memory controller, data interaction between the memory and memory controller may be implemented, i.e., each memory granule of the memory will still be accessed by the memory controller. In the embodiment of the present application, an interface used for accessing data particles in the memory controller is a data particle interface, and an interface used for accessing redundant particles is a redundant particle interface.
It should be understood that in the embodiment of the present application, each gate may be connected only to the data granule interface and the redundant granule interface in the present channel, and not to the redundant granule interfaces in the remaining channels, such as shown in fig. 3.
In addition, in this embodiment, each gate may also be connected to the data granule interface and the redundant granule interface in the present channel, and at the same time, also connected to the redundant granule interfaces in the remaining channels, so that after the memory granule with the error is found in a certain channel, the memory controller may use the redundant memory granule in the remaining channels to replace the memory granule with the error memory granule, for example, as shown in fig. 4.
It should be noted that, in the embodiment of the present application, the ECC error correction circuit may further implement an ECC error correction algorithm, so as to implement error correction on the memory.
It should be further noted that, in the embodiment of the present application, the ECC error correction circuit and the precision synchronization mode circuit may be implemented by using an existing ECC error correction circuit and a precision synchronization mode circuit. The gating circuit can be realized by a common multi-way gating circuit.
In an embodiment of the present application, an electronic device is further provided, as shown in fig. 5, which includes a memory and a memory controller.
The memory is a memory with a redundancy ratio of more than or equal to 1 to 4. Such as DDR5 memory.
The memory controller may adopt the aforementioned memory controller, so that the error correction of the memory may be implemented according to the memory error correction method provided in the embodiment of the present application.
In the embodiment of the present application, the electronic device may be an electronic device such as a mobile phone, a computer, a server, and the like, which has a memory and a memory controller.
It should be understood that the memory with the highest redundancy ratio currently on the market is the DDR5 memory, and the redundancy ratio is 1 to 4. If a memory with larger redundancy ratio is developed in the future, the scheme provided in the embodiment of the present application can also be used for implementation.
According to the memory error correction method, the memory controller and the electronic device provided by the embodiment of the application, error correction of wrong memory particles can be realized by using redundant memory particles in the channels, and because each channel is provided with the redundant memory particle substitution, error correction of a plurality of wrong memory particles can be realized, so that the error correction capability of the plurality of wrong memory particles is realized, the memory reliability can be improved, and the requirement on the memory reliability in practical application is further met.
Example two:
the present embodiment takes a specific memory error correction process applied to the DDR5 memory as an example to illustrate the present application.
The DDR5 memory is uniformly addressed by using a precise synchronization mode, 2 channels (a channel 0 and a channel 1) are uniformly read and written, and data is distributed on the 2 channels.
The ECC error correction algorithm uses the RS (144, 128) algorithm, and the ECC word is 128+16 bits.
Since DDR5 memory data bits are 32+8 bits wide. There are 128+32 bits for each of the 2 channel reads. Whereas ECC error correction algorithms use only 16 bits of 32-bit redundant data. Thus 2 channels are left with 16bit redundancy data. Each channel has 8 bits of redundant data left. I.e., one redundant memory granule per channel.
In the initial state, as shown in fig. 6, 18 memory particles with 4 bits in total, D0, D1, D2, D3, D4, D5, D6, D7 and C0, in two channels, are read twice to form a complete ECCword. The C1 memory particles in both channels are redundant memory particles.
Using the RS (144, 128) x8chipkill algorithm (x8chipkill algorithm is for an 8-bit size chipkill algorithm), when the algorithm finds that channel 0 has one memory granule in error, as illustrated in fig. 7 for channel 0's D0 memory granule in error. The C1 memory particle for channel 0 may now be used to replace the offending D0 memory particle. Later read and write commands to the D0 memory die are all loaded onto the C1 memory die. D0 the memory particles do nothing.
At this time, the ECCword becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of channel 0 and D0, D1, D2, D3, D4, D5, D6, D7, C0 of channel 1.
It should be understood that fig. 7 is merely illustrative of a case where a memory particle fault occurs in D0. In fact, any of the 8 memory particles D0, D1, D2, D3, D4, D5, D6, D7 of channel 0 that are faulty can be replaced with C0 memory particles.
Under the new ECC word, when channel 1 is found to also have an error memory granule, the D3 memory granule of channel 1 is shown as an error in FIG. 7. The C1 memory particle of channel 1 may be used to replace the faulty D3 memory particle. Thereafter all read and write commands to the D3 memory granule for channel 1 are loaded onto the C1 memory granule for channel 1. The memory particles in channel 1D 3 are wasted.
At this time, the ECCword becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of channel 0 and D0, D1, D2, D4, D5, D6, D7, C0, C1 of channel 1.
The newly formed ECC word can further correct an error memory granule based on an ECC error correction algorithm. The faulty memory particle may be on channel 0 or on channel 1.
In extreme cases, three faulty memory particles may all be on the same channel. For example, as shown in FIG. 8, three faulty memory particles are on channel 0. For such a situation, with the solution of this embodiment, the first faulty memory particle may be replaced with the C1 memory particle of channel 0, and when the second faulty memory particle of channel 0 is found, it may be determined whether the current C1 memory particle of channel 1 is used. In this example, the C1 memory granule for channel 1 is not used, so the C1 memory granule for channel 1 can be used to replace the faulty memory granule on channel 0. And for the second erroneous memory granule, it can be corrected by the ECC error correction algorithm.
The scheme provided by the embodiment uses the precise synchronization mode for the DDR5 memory, can correct 3 memory granules at most in 128-bit data, and supports various error scenarios. Compared with the traditional scheme which only adopts the ECC error correction algorithm and can only correct 1 memory particle, the scheme obviously improves the error correction capability and improves the reliability of the DDR memory.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
In this context, a plurality means two or more.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A memory error correction method is characterized by comprising the following steps:
uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC (error correction code) error correction algorithm;
and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
2. The memory error correction method of claim 1, further comprising:
and when the ECC error correction algorithm finds the wrong memory granules and redundant memory granules do not exist, correcting the error of the wrong memory granules by using the ECC error correction algorithm.
3. The memory error correction method of claim 1, wherein when the erroneous memory grain is found by the ECC error correction algorithm, before the redundant memory grain is used to replace the erroneous memory grain, the method further comprises:
determining that the ECC error correction algorithm has been used for error correction.
4. The memory error correction method of claim 1, wherein replacing the erroneous memory grain with a redundant memory grain when the erroneous memory grain is found by the ECC error correction algorithm, comprises:
and when the error memory grain is found through the ECC algorithm, replacing the error memory grain with a redundant memory grain in the channel where the error memory grain is located.
5. The memory error correction method of claim 4, wherein the using of redundant memory grains to replace erroneous memory grains further comprises:
and when the channel where the error memory granule is located does not have the redundant memory granule, using the redundant memory granule in another channel except the channel where the error memory granule is located to replace the error memory granule in each channel for unified reading and writing.
6. The memory error correction method according to any one of claims 1 to 5, wherein the memory is a memory with a data bit width of 32+8 bits and a storage size of a single memory granule of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
7. A memory controller, comprising: an ECC error correction circuit and a precise synchronization mode circuit;
the accurate synchronization mode circuit is used for uniformly reading and writing a plurality of channels of the memory by using an accurate synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
the ECC correction circuit is used for reading each memory grain in each channel, replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC correction algorithm according to the read data, and performing useless processing on the wrong memory grain; wherein the read data meets the minimum data requirement of the ECC error correction algorithm.
8. The memory controller of claim 7, wherein the memory granule comprises a data granule and a redundant memory granule; the memory controller also comprises a gate which belongs to each channel;
the gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the connection between the data particle interfaces and the redundant particle interface when the ECC correction circuit finds the erroneous data particles in the channel; wherein:
the data particle interface is an interface used for accessing data particles in the memory controller; the redundant granule interface is an interface used for accessing redundant memory granules in the memory controller.
9. The memory controller of claim 8, wherein the strobes are each further to interface with redundant grains in remaining channels outside the channel to which the strobes are coupled.
10. An electronic device, comprising: a memory and a memory controller;
the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
the memory controller is configured to execute the memory error correction method according to any one of claims 1 to 6, so as to implement error correction on the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011461460.6A CN112579342B (en) | 2020-12-07 | 2020-12-07 | Memory error correction method, memory controller and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011461460.6A CN112579342B (en) | 2020-12-07 | 2020-12-07 | Memory error correction method, memory controller and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112579342A true CN112579342A (en) | 2021-03-30 |
CN112579342B CN112579342B (en) | 2024-02-13 |
Family
ID=75131557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011461460.6A Active CN112579342B (en) | 2020-12-07 | 2020-12-07 | Memory error correction method, memory controller and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112579342B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114518972A (en) * | 2022-02-14 | 2022-05-20 | 海光信息技术股份有限公司 | Memory error processing method and device, memory controller and processor |
WO2023179634A1 (en) * | 2022-03-22 | 2023-09-28 | 华为技术有限公司 | Data writing method and processing system |
WO2023202592A1 (en) * | 2022-04-19 | 2023-10-26 | 华为技术有限公司 | Data writing method and processing system |
WO2023236996A1 (en) * | 2022-06-08 | 2023-12-14 | 华为技术有限公司 | Memory module and electronic device |
WO2024198468A1 (en) * | 2023-03-31 | 2024-10-03 | 华为技术有限公司 | Memory error correction method, memory module, memory controller, and processor |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001297594A (en) * | 2000-04-06 | 2001-10-26 | Hewlett Packard Co <Hp> | Method for providing capability of reprogramming memory redundancy in field |
CN102332302A (en) * | 2011-07-19 | 2012-01-25 | 北京时代全芯科技有限公司 | Phase change memory and redundancy replacing method for same |
CN103295649A (en) * | 2013-04-28 | 2013-09-11 | 上海宏力半导体制造有限公司 | Method for improving reliability of a nonvolatile memory |
CN109328340A (en) * | 2017-09-30 | 2019-02-12 | 华为技术有限公司 | Detection method, device and the server of memory failure |
CN111294059A (en) * | 2019-12-26 | 2020-06-16 | 成都海光集成电路设计有限公司 | Encoding method, decoding method, error correction method and related device |
CN111312321A (en) * | 2020-03-02 | 2020-06-19 | 电子科技大学 | Memory device and fault repairing method thereof |
CN111459712A (en) * | 2020-04-16 | 2020-07-28 | 上海安路信息科技有限公司 | SRAM type FPGA single event upset error correction method and single event upset error correction circuit |
-
2020
- 2020-12-07 CN CN202011461460.6A patent/CN112579342B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001297594A (en) * | 2000-04-06 | 2001-10-26 | Hewlett Packard Co <Hp> | Method for providing capability of reprogramming memory redundancy in field |
CN102332302A (en) * | 2011-07-19 | 2012-01-25 | 北京时代全芯科技有限公司 | Phase change memory and redundancy replacing method for same |
CN103295649A (en) * | 2013-04-28 | 2013-09-11 | 上海宏力半导体制造有限公司 | Method for improving reliability of a nonvolatile memory |
CN109328340A (en) * | 2017-09-30 | 2019-02-12 | 华为技术有限公司 | Detection method, device and the server of memory failure |
CN111294059A (en) * | 2019-12-26 | 2020-06-16 | 成都海光集成电路设计有限公司 | Encoding method, decoding method, error correction method and related device |
CN111312321A (en) * | 2020-03-02 | 2020-06-19 | 电子科技大学 | Memory device and fault repairing method thereof |
CN111459712A (en) * | 2020-04-16 | 2020-07-28 | 上海安路信息科技有限公司 | SRAM type FPGA single event upset error correction method and single event upset error correction circuit |
Non-Patent Citations (1)
Title |
---|
HUANG XIAOHE等: "In-memory computing to break the memory wall", 《CHINESE PHYSICS B》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114518972A (en) * | 2022-02-14 | 2022-05-20 | 海光信息技术股份有限公司 | Memory error processing method and device, memory controller and processor |
WO2023179634A1 (en) * | 2022-03-22 | 2023-09-28 | 华为技术有限公司 | Data writing method and processing system |
WO2023202592A1 (en) * | 2022-04-19 | 2023-10-26 | 华为技术有限公司 | Data writing method and processing system |
WO2023236996A1 (en) * | 2022-06-08 | 2023-12-14 | 华为技术有限公司 | Memory module and electronic device |
WO2024198468A1 (en) * | 2023-03-31 | 2024-10-03 | 华为技术有限公司 | Memory error correction method, memory module, memory controller, and processor |
Also Published As
Publication number | Publication date |
---|---|
CN112579342B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112579342B (en) | Memory error correction method, memory controller and electronic equipment | |
US10824499B2 (en) | Memory system architectures using a separate system control path or channel for processing error information | |
US10002043B2 (en) | Memory devices and modules | |
US8341499B2 (en) | System and method for error detection in a redundant memory system | |
US7428689B2 (en) | Data memory system and method for transferring data into a data memory | |
US7587658B1 (en) | ECC encoding for uncorrectable errors | |
US8103900B2 (en) | Implementing enhanced memory reliability using memory scrub operations | |
KR102378466B1 (en) | Memory devices and modules | |
US8566672B2 (en) | Selective checkbit modification for error correction | |
US9262284B2 (en) | Single channel memory mirror | |
US9785570B2 (en) | Memory devices and modules | |
US11409601B1 (en) | Memory device protection | |
US12032443B2 (en) | Shadow DRAM with CRC+RAID architecture, system and method for high RAS feature in a CXL drive | |
CN115729746A (en) | Data storage protection method based on CRC and ECC | |
CN111142797B (en) | Solid state disk refreshing method and device and solid state disk | |
US20240004791A1 (en) | Controller cache architeture | |
CN114153402B (en) | Memory and data reading and writing method thereof | |
CN111383701B (en) | Redundancy error correction structure of OTP | |
CN109753239B (en) | Semiconductor memory module, semiconductor memory system, and method of accessing the same | |
US20240086090A1 (en) | Memory channel disablement | |
US20240096439A1 (en) | Selective per die dram ppr for cxl type 3 device | |
CN113495674B (en) | Read-write method and memory device | |
WO2023208228A1 (en) | Storage device and data processing method | |
CN113495677B (en) | Read-write method and memory device | |
US20230386598A1 (en) | Methods for real-time repairing of memory failures caused during operations, memory systems performing repairing methods, and data processing systems including repairing memory systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |