CN112579342A - Memory error correction method, memory controller and electronic equipment - Google Patents

Memory error correction method, memory controller and electronic equipment Download PDF

Info

Publication number
CN112579342A
CN112579342A CN202011461460.6A CN202011461460A CN112579342A CN 112579342 A CN112579342 A CN 112579342A CN 202011461460 A CN202011461460 A CN 202011461460A CN 112579342 A CN112579342 A CN 112579342A
Authority
CN
China
Prior art keywords
memory
error correction
granule
redundant
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011461460.6A
Other languages
Chinese (zh)
Other versions
CN112579342B (en
Inventor
周鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202011461460.6A priority Critical patent/CN112579342B/en
Publication of CN112579342A publication Critical patent/CN112579342A/en
Application granted granted Critical
Publication of CN112579342B publication Critical patent/CN112579342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • For Increasing The Reliability Of Semiconductor Memories (AREA)

Abstract

The application provides a memory error correction method, a memory controller and an electronic device, wherein the method comprises the following steps: uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; and reading the data of each memory grain in each channel, and replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC (error correction code) algorithm according to the read data, and performing useless processing on the wrong memory grain. Therefore, the redundant memory particles are adopted to replace the wrong memory particles for working, so that the normal operation of the channel can be ensured. Since each channel has redundant memory granule replacement, the error correction of multiple wrong memory granules can be realized in the implementation process.

Description

Memory error correction method, memory controller and electronic equipment
Technical Field
The present disclosure relates to the field of memory technologies, and in particular, to a memory error correction method, a memory controller, and an electronic device.
Background
In the field of computers, a memory is one of important components in a computer, and all programs in the computer are executed in the memory, and the memory is used for temporarily storing operation data in a Central Processing Unit (CPU) and data exchanged with an external storage such as a hard disk. As long as the computer is in operation, the CPU transfers the data to be operated to the memory for operation, and after the operation is finished, the CPU transmits the result. Therefore, the reliability of the data in the memory is crucial to the performance of the whole system, and directly influences the operation of the whole system.
Therefore, in the memory, an ECC (Error Correction Code) technology is generally adopted to implement memory Error Correction.
Although the ECC error correction technique has a certain error checking capability and error correction capability, the error correction capability is only suitable for the case that only one error memory granule exists in the memory, and if 2 or more error memory granules exist in the memory at present, correct data cannot be corrected by applying the ECC error correction procedure. That is, the error correction capability of the ECC error correction process is limited to the case that only one error memory granule exists in the memory, and the error correction capability is limited, so that the requirement for the reliability of the memory in practical application cannot be met.
Disclosure of Invention
An embodiment of the present invention provides a memory error correction method, a memory controller and an electronic device, so as to solve the above problems.
The embodiment of the application provides a memory error correction method, which comprises the following steps: uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC (error correction code) error correction algorithm; and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
In the implementation process, when the memory redundancy ratio is greater than or equal to 1 to 4, in the memory in the precision synchronous mode, each channel may have redundant memory granules in addition to the data granules used for data storage and the ECC granules used as ECC check codes. In the embodiment of the present application, when an erroneous memory granule is found, the redundant memory granule is used to replace the erroneous memory granule, and the erroneous memory granule is subjected to useless processing. Namely, in the implementation process, the error memory granules are replaced by the redundant memory granules to work, so that the normal operation of the channel can be ensured. Since each channel has redundant memory granules, the error correction of multiple wrong memory granules can be realized in the implementation process, so that the error correction capability of multiple wrong memory granules is realized, and the requirement on the reliability of the memory in practical application is further met.
Further, the method further comprises: and when the ECC error correction algorithm finds the wrong memory granules and redundant memory granules do not exist, correcting the error of the wrong memory granules by using the ECC error correction algorithm.
In the implementation manner, when an erroneous memory granule is found and no redundant memory granule exists (that is, the redundant memory granule is used up), the erroneous memory granule can be corrected by an ECC error correction algorithm, so that the error correction capability of the erroneous memory granule is further increased, the number of error-correctable memory granules is increased, and the requirement on the reliability of the memory in practical application can be better met.
Further, when an erroneous memory granule is found by the ECC error correction algorithm, before the redundant memory granule is used to replace the erroneous memory granule, the method further includes: determining that the ECC error correction algorithm has been used for error correction.
In the implementation process, when an erroneous memory granule is found, an ECC error correction algorithm is performed first. Then, when the wrong memory grains are found, the redundant memory grains are used for replacing the wrong memory grains, so that on the basis that one memory grain can be corrected by an ECC correction algorithm, the error correction of the newly found wrong memory grains is realized by combining the redundant memory grains, the number of the error-correctable memory grains is increased, and the requirement on the reliability of the memory in practical application can be better met.
Further, when an erroneous memory granule is found by the ECC error correction algorithm, replacing the erroneous memory granule with a redundant memory granule, including: and when the error memory grain is found through the ECC algorithm, replacing the error memory grain with a redundant memory grain in the channel where the error memory grain is located.
It should be appreciated that by replacing the faulty memory granule with a redundant memory granule within the channel in which the faulty memory granule resides, the replaced memory granule remains within the same channel as the normal memory granule, so that the channel identification logic of the data may not have to be altered.
Further, the replacing the erroneous memory granule with the redundant memory granule further includes: and when the channel where the error memory granule is located does not have the redundant memory granule, using the redundant memory granule in another channel except the channel where the error memory granule is located to replace the error memory granule in each channel for unified reading and writing.
In the implementation process, redundant memory granules are allowed to be adopted to replace the wrong memory granules in the cross-channel mode, so that the error correction can be carried out by fully utilizing the redundant memory granules, even if the wrong memory granules appear on a certain channel but the redundant memory granules do not exist, the wrong memory granules can be replaced by the redundant memory granules in other channels, and the adaptability to different error scenes is improved.
Further, the memory is a memory with a data bit width of 32+8 bits and a storage size of a single memory grain of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
For a memory with a Data bit width of 32+8 bits and a storage size of a single memory granule of 4 bits, such as a DDR5(Double Data Rate SDRAM 5, 5 th generation Double Data Rate synchronous dynamic random access memory) memory, a minimum Data processing unit of an ECC error correction algorithm is 128+16 bits, and then each channel has one redundant memory granule for error correction, so that the memory granule can have an error correction capability for 3 erroneous memory granules at most, thereby increasing the number of error-correctable memory granules and better satisfying the requirement for memory reliability in practical application compared with the existing mode of only using ECC error correction.
An embodiment of the present application further provides a memory controller, including: an ECC error correction circuit and a precise synchronization mode circuit; the accurate synchronization mode circuit is used for uniformly reading and writing a plurality of channels of the memory by using an accurate synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; the ECC correction circuit is used for reading each memory grain in each channel, replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC correction algorithm according to the read data, and performing useless processing on the wrong memory grain; wherein the read data meets the minimum data requirement of the ECC error correction algorithm.
The memory controller can utilize redundant memory granules in the channels to realize error correction of wrong memory granules, and each channel is provided with the redundant memory granules to replace the memory granules, so that the error correction of a plurality of wrong memory granules can be realized, the error correction capability of the plurality of wrong memory granules is realized, the memory reliability can be improved, and the requirement on the memory reliability in practical application is further met.
Further, the memory granule comprises a data granule and a redundant memory granule; the memory controller also comprises a gate which belongs to each channel; the gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the connection between the data particle interfaces and the redundant particle interface when the ECC correction circuit finds the erroneous data particles in the channel; wherein: the data particle interface is an interface used for accessing data particles in the memory controller; the redundant granule interface is an interface used for accessing redundant memory granules in the memory controller.
In the implementation process, on-off control of the memory controller to different memory particles in each channel in the memory can be realized through the gate corresponding to each channel, so that redundant memory particles can be used for replacing wrong memory particles. The circuit is simple to realize, does not need to greatly change the circuit structure of the existing memory controller, and has better universality.
Further, the gates are also respectively connected with redundant particle interfaces in the other channels except the channel.
In this way, the replacement capability of redundant memory granules across channels can be provided, so that even if an error memory granule occurs on one channel but no redundant memory granule exists, the error memory granule can be replaced by using the redundant memory granules in the rest channels, thereby improving the adaptability to different error scenes.
An embodiment of the present application further provides an electronic device, including any one of the foregoing memories and a memory controller; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4; the memory controller is configured to execute any one of the above-mentioned memory error correction methods to implement error correction of the memory.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a memory error correction method according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating a basic structure of a memory controller according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a strobe connection structure of a memory controller according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of another strobe connection structure of a memory controller according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating an initial state of a DRR5 memory according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram illustrating a channel status when an error memory granule occurs on two channels of a DRR5 memory according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a channel status when 3 memory granules with errors occur on channel 0 of the DRR5 memory according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
an embodiment of the present application provides a memory error correction method, which can be seen in fig. 1, and includes:
s101: and uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode.
It should be noted that the so-called precision synchronization Mode, also called Lockstep Mode (Lockstep Channel Mode), is to use the same redundant hardware components to process the same instruction at the same time, so that the data on one CPU Cache line (Cache-line) is distributed to several memory channels.
After the precise synchronization mode is used, a plurality of channels of the memory can be used for storing data on the same CPU cache line, so that the memory granules among the channels have the sharing possibility.
S102: and reading the data of each memory particle in each channel.
It should be noted that, in the embodiment of the present application, the read data should meet the minimum data requirement of the ECC error correction algorithm, so that the ECC error correction algorithm operates normally, and can identify errors in the read data, thereby locating whether there are erroneous memory granules, and specifically which memory granule has an error.
S103: and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
It should be noted that, in the embodiment of the present application, a chipkill algorithm may be used to implement that a redundant memory granule is used to replace an erroneous memory granule.
Furthermore, in the embodiment of the present application, the ECC error correction algorithm may be implemented by, but not limited to, an RS (144, 128) algorithm.
It should be noted that, in the embodiment of the present application, the memory should be a memory with a redundancy ratio of 1 to 4 or more. It should be noted that the data bit width of the memory includes two parts, which are the bit width of the required data part and the bit width of the redundant data part. For example, for a conventional DDR4 memory, the data bit width is 64+8 bits, i.e., the data bit width is composed of 64-bit data bits + 8-bit redundancy bits. For another example, for a conventional DDR5 memory, the data bit width is 32+8 bits, that is, the data bit width is formed by 32-bit data bits + 8-bit redundancy bits.
The redundancy ratio is the ratio of the bit width of the redundant bit to the bit width of the data bit in the memory data bit width. For example, for DDR4 memory, the redundancy ratio is 1 to 8; for DDR5 memory, the redundancy ratio is 1 to 4.
It should be further noted that, since the minimum data requirement (hereinafter referred to as ECC word) of the ECC error correction algorithm is 128+16 bits, for DDR4 memory, the redundancy bits need to be all used for saving the check data of the ECC error correction algorithm, so that there is no redundant memory granule in the memory.
For a 1 to 4 redundancy ratio memory, such as a DDR5 memory, there is a redundant memory granule for each channel after storing the check data of the ECC error correction algorithm. Therefore, in the embodiment of the present application, the memory is a memory with a redundancy ratio of 1 to 4 or more.
It should be understood that, in the embodiment of the present application, when data of each memory granule in each channel is read, the read data should meet the requirement of the ECC Word, so that the ECC error correction algorithm can process the data.
For example, assuming that the ECC Word is 128+16 bits, and the memory is a DDR5 memory with a storage size of 4 bits and a single memory grain with two channels, two channels of the memory may be read twice, so as to obtain 128+32 bits of data, thereby satisfying the 128+16 bits requirement of the ECC Word. Each of the two channels now has one redundant memory granule, so that the memory will have two redundant memory granules.
Of course, for a DDR5 memory with a storage size of 4 bits with a single memory grain of 4 channels, the data 128+32 bits can be obtained by reading 4 channels only once, thereby meeting the 128+16bit requirement of ECC Word. Each of the 4 channels now has one redundant memory granule, so that a DDR5 memory will have 4 redundant memory granules.
It should be noted that, in the embodiment of the present application, the error correction capability of the ECC error correction algorithm itself for an erroneous memory granule may also be combined, so as to further improve the error correction capability for the memory.
For example, for a DDR5 memory with two channels and a single memory granule with a storage size of 4 bits, there is one redundant memory granule per channel after uniform reading and writing of the two channels using the precise synchronization mode. Then error correction for both memory grains can be achieved by replacing the erroneous memory grain with a redundant memory grain. In addition, by using the ECC error correction algorithm, an error correction of an erroneous memory granule can be additionally performed.
In a possible implementation manner of the embodiment of the present application, an ECC error correction algorithm may be first used to correct a first discovered erroneous memory granule, and then, for a later discovered erroneous memory granule, error correction is implemented by using a redundant memory granule instead of the erroneous memory granule.
In addition, in a possible implementation manner of the embodiment of the present application, after the erroneous memory granule is detected, the redundant memory granule may be used to replace the erroneous memory granule for error correction, and after the redundant memory granule is used, the ECC error correction algorithm may be used to correct the newly found erroneous memory granule.
It should be noted that, in this embodiment, it may be configured to only use the redundant memory granule in the channel where the erroneous memory granule is located to replace the erroneous memory granule, so as to achieve the effect of error correction by using the redundant memory granule in the channel.
At this time, after the redundant memory granules in the channel are used up (i.e. there are no redundant memory granules), if an erroneous memory granule is found in the channel and the ECC error correction algorithm is not yet used to correct a certain memory granule, the ECC error correction algorithm may be used to correct the erroneous memory granule. However, if the ECC error correction algorithm is used to correct a certain memory granule, even if there is a redundant memory granule in another channel, the error correction cannot be performed on the erroneous memory granule, and at this time, a memory failure may be reported, and an engineer may perform memory repair or replacement.
In addition, in the present embodiment, it may also be configured that redundant memory granules may be adopted across channels to replace the error memory granules. That is, when there is no redundant memory granule in the channel where the found erroneous memory granule is located, the redundant memory granule in another channel except the channel where the erroneous memory granule is located in each channel for uniform reading and writing can be used to replace the erroneous memory granule. At this time, all redundant memory granules can be fully utilized, so that the scheme can be suitable for various memory granule failures.
For example, as shown in FIG. 8, after the redundant memory granule in channel 0 is used, the redundant memory granule in channel 1 may be used to replace the erroneous memory granule.
It should be noted that, in the embodiment of the present application, when the redundant memory granule is allowed to be used to replace the erroneous memory granule across the channels, the redundant memory granule in the channel where the erroneous memory granule is located may be preferentially used to replace the erroneous memory granule in the manner described above, and when the redundant memory granule does not exist in the channel where the erroneous memory granule is located, the redundant memory granule in another channel may be used to replace the erroneous memory granule.
However, the order of using the redundant memory granules is not limited, that is, when there is a redundant memory granule in the channel where the faulty memory granule is located, the redundant memory granule in another channel may be used to replace the faulty memory granule.
It should be noted that, in the embodiment of the present application, after the redundant memory granule replaces the erroneous memory granule, the redundant memory granule in the memory is used to implement the function of the erroneous memory granule, that is, the redundant memory granule does not belong to the redundant memory granule any more. For the erroneous memory granule, since useless processing (such as marking the memory granule as faulty, unavailable, etc. in the memory controller) is performed, the memory granule is discarded in the memory, and the erroneous memory granule is not a redundant memory granule after being replaced. Thus, redundant memory particles are "consumables" in the context of embodiments of the present application, and the label "redundant memory particles" is automatically lost after use.
The embodiment of the present application further provides a memory controller, which may be as shown in fig. 2. The memory controller may include: an ECC error correction circuit and a precision synchronous mode circuit. Wherein:
the precision synchronization mode circuit can be used for uniformly reading and writing a plurality of channels of the memory by using the precision synchronization mode.
The ECC error correction circuit may be configured to read each memory granule in each channel, and when an erroneous memory granule is found by an ECC error correction algorithm according to the read data, replace the erroneous memory granule with a redundant memory granule, and perform useless processing on the erroneous memory granule.
It should be noted that, in the memory, the memory granules include a data granule for storing data, an ECC granule for storing check data of an ECC error correction algorithm, and a redundancy granule (i.e., a redundant memory granule) remaining except for the data granule and the ECC granule.
In the embodiment of the present application, in order to better implement the replacement of the erroneous memory granule by the redundant memory granule, it can be seen from fig. 3 that the memory controller further includes gates respectively belonging to the channels.
And each gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the data particle interfaces when the ECC correction circuit finds the erroneous data particles in the channel.
It should be understood that in most cases the memory and memory controller are not directly connected, but that no matter what circuits are present between the memory and memory controller, data interaction between the memory and memory controller may be implemented, i.e., each memory granule of the memory will still be accessed by the memory controller. In the embodiment of the present application, an interface used for accessing data particles in the memory controller is a data particle interface, and an interface used for accessing redundant particles is a redundant particle interface.
It should be understood that in the embodiment of the present application, each gate may be connected only to the data granule interface and the redundant granule interface in the present channel, and not to the redundant granule interfaces in the remaining channels, such as shown in fig. 3.
In addition, in this embodiment, each gate may also be connected to the data granule interface and the redundant granule interface in the present channel, and at the same time, also connected to the redundant granule interfaces in the remaining channels, so that after the memory granule with the error is found in a certain channel, the memory controller may use the redundant memory granule in the remaining channels to replace the memory granule with the error memory granule, for example, as shown in fig. 4.
It should be noted that, in the embodiment of the present application, the ECC error correction circuit may further implement an ECC error correction algorithm, so as to implement error correction on the memory.
It should be further noted that, in the embodiment of the present application, the ECC error correction circuit and the precision synchronization mode circuit may be implemented by using an existing ECC error correction circuit and a precision synchronization mode circuit. The gating circuit can be realized by a common multi-way gating circuit.
In an embodiment of the present application, an electronic device is further provided, as shown in fig. 5, which includes a memory and a memory controller.
The memory is a memory with a redundancy ratio of more than or equal to 1 to 4. Such as DDR5 memory.
The memory controller may adopt the aforementioned memory controller, so that the error correction of the memory may be implemented according to the memory error correction method provided in the embodiment of the present application.
In the embodiment of the present application, the electronic device may be an electronic device such as a mobile phone, a computer, a server, and the like, which has a memory and a memory controller.
It should be understood that the memory with the highest redundancy ratio currently on the market is the DDR5 memory, and the redundancy ratio is 1 to 4. If a memory with larger redundancy ratio is developed in the future, the scheme provided in the embodiment of the present application can also be used for implementation.
According to the memory error correction method, the memory controller and the electronic device provided by the embodiment of the application, error correction of wrong memory particles can be realized by using redundant memory particles in the channels, and because each channel is provided with the redundant memory particle substitution, error correction of a plurality of wrong memory particles can be realized, so that the error correction capability of the plurality of wrong memory particles is realized, the memory reliability can be improved, and the requirement on the memory reliability in practical application is further met.
Example two:
the present embodiment takes a specific memory error correction process applied to the DDR5 memory as an example to illustrate the present application.
The DDR5 memory is uniformly addressed by using a precise synchronization mode, 2 channels (a channel 0 and a channel 1) are uniformly read and written, and data is distributed on the 2 channels.
The ECC error correction algorithm uses the RS (144, 128) algorithm, and the ECC word is 128+16 bits.
Since DDR5 memory data bits are 32+8 bits wide. There are 128+32 bits for each of the 2 channel reads. Whereas ECC error correction algorithms use only 16 bits of 32-bit redundant data. Thus 2 channels are left with 16bit redundancy data. Each channel has 8 bits of redundant data left. I.e., one redundant memory granule per channel.
In the initial state, as shown in fig. 6, 18 memory particles with 4 bits in total, D0, D1, D2, D3, D4, D5, D6, D7 and C0, in two channels, are read twice to form a complete ECCword. The C1 memory particles in both channels are redundant memory particles.
Using the RS (144, 128) x8chipkill algorithm (x8chipkill algorithm is for an 8-bit size chipkill algorithm), when the algorithm finds that channel 0 has one memory granule in error, as illustrated in fig. 7 for channel 0's D0 memory granule in error. The C1 memory particle for channel 0 may now be used to replace the offending D0 memory particle. Later read and write commands to the D0 memory die are all loaded onto the C1 memory die. D0 the memory particles do nothing.
At this time, the ECCword becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of channel 0 and D0, D1, D2, D3, D4, D5, D6, D7, C0 of channel 1.
It should be understood that fig. 7 is merely illustrative of a case where a memory particle fault occurs in D0. In fact, any of the 8 memory particles D0, D1, D2, D3, D4, D5, D6, D7 of channel 0 that are faulty can be replaced with C0 memory particles.
Under the new ECC word, when channel 1 is found to also have an error memory granule, the D3 memory granule of channel 1 is shown as an error in FIG. 7. The C1 memory particle of channel 1 may be used to replace the faulty D3 memory particle. Thereafter all read and write commands to the D3 memory granule for channel 1 are loaded onto the C1 memory granule for channel 1. The memory particles in channel 1D 3 are wasted.
At this time, the ECCword becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of channel 0 and D0, D1, D2, D4, D5, D6, D7, C0, C1 of channel 1.
The newly formed ECC word can further correct an error memory granule based on an ECC error correction algorithm. The faulty memory particle may be on channel 0 or on channel 1.
In extreme cases, three faulty memory particles may all be on the same channel. For example, as shown in FIG. 8, three faulty memory particles are on channel 0. For such a situation, with the solution of this embodiment, the first faulty memory particle may be replaced with the C1 memory particle of channel 0, and when the second faulty memory particle of channel 0 is found, it may be determined whether the current C1 memory particle of channel 1 is used. In this example, the C1 memory granule for channel 1 is not used, so the C1 memory granule for channel 1 can be used to replace the faulty memory granule on channel 0. And for the second erroneous memory granule, it can be corrected by the ECC error correction algorithm.
The scheme provided by the embodiment uses the precise synchronization mode for the DDR5 memory, can correct 3 memory granules at most in 128-bit data, and supports various error scenarios. Compared with the traditional scheme which only adopts the ECC error correction algorithm and can only correct 1 memory particle, the scheme obviously improves the error correction capability and improves the reliability of the DDR memory.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
In this context, a plurality means two or more.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A memory error correction method is characterized by comprising the following steps:
uniformly reading and writing a plurality of channels of the memory by using a precise synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC (error correction code) error correction algorithm;
and when the error memory granules are found through an ECC (error correction code) correction algorithm according to the read data, replacing the error memory granules with the redundant memory granules, and performing useless processing on the error memory granules.
2. The memory error correction method of claim 1, further comprising:
and when the ECC error correction algorithm finds the wrong memory granules and redundant memory granules do not exist, correcting the error of the wrong memory granules by using the ECC error correction algorithm.
3. The memory error correction method of claim 1, wherein when the erroneous memory grain is found by the ECC error correction algorithm, before the redundant memory grain is used to replace the erroneous memory grain, the method further comprises:
determining that the ECC error correction algorithm has been used for error correction.
4. The memory error correction method of claim 1, wherein replacing the erroneous memory grain with a redundant memory grain when the erroneous memory grain is found by the ECC error correction algorithm, comprises:
and when the error memory grain is found through the ECC algorithm, replacing the error memory grain with a redundant memory grain in the channel where the error memory grain is located.
5. The memory error correction method of claim 4, wherein the using of redundant memory grains to replace erroneous memory grains further comprises:
and when the channel where the error memory granule is located does not have the redundant memory granule, using the redundant memory granule in another channel except the channel where the error memory granule is located to replace the error memory granule in each channel for unified reading and writing.
6. The memory error correction method according to any one of claims 1 to 5, wherein the memory is a memory with a data bit width of 32+8 bits and a storage size of a single memory granule of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
7. A memory controller, comprising: an ECC error correction circuit and a precise synchronization mode circuit;
the accurate synchronization mode circuit is used for uniformly reading and writing a plurality of channels of the memory by using an accurate synchronization mode; the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
the ECC correction circuit is used for reading each memory grain in each channel, replacing the wrong memory grain with the redundant memory grain when the wrong memory grain is found through an ECC correction algorithm according to the read data, and performing useless processing on the wrong memory grain; wherein the read data meets the minimum data requirement of the ECC error correction algorithm.
8. The memory controller of claim 7, wherein the memory granule comprises a data granule and a redundant memory granule; the memory controller also comprises a gate which belongs to each channel;
the gate is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the path between the data particle interfaces corresponding to the erroneous data particles and connecting the connection between the data particle interfaces and the redundant particle interface when the ECC correction circuit finds the erroneous data particles in the channel; wherein:
the data particle interface is an interface used for accessing data particles in the memory controller; the redundant granule interface is an interface used for accessing redundant memory granules in the memory controller.
9. The memory controller of claim 8, wherein the strobes are each further to interface with redundant grains in remaining channels outside the channel to which the strobes are coupled.
10. An electronic device, comprising: a memory and a memory controller;
the memory is a memory with a redundancy ratio of more than or equal to 1 to 4;
the memory controller is configured to execute the memory error correction method according to any one of claims 1 to 6, so as to implement error correction on the memory.
CN202011461460.6A 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment Active CN112579342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011461460.6A CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011461460.6A CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Publications (2)

Publication Number Publication Date
CN112579342A true CN112579342A (en) 2021-03-30
CN112579342B CN112579342B (en) 2024-02-13

Family

ID=75131557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011461460.6A Active CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Country Status (1)

Country Link
CN (1) CN112579342B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114518972A (en) * 2022-02-14 2022-05-20 海光信息技术股份有限公司 Memory error processing method and device, memory controller and processor
WO2023179634A1 (en) * 2022-03-22 2023-09-28 华为技术有限公司 Data writing method and processing system
WO2023202592A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Data writing method and processing system
WO2023236996A1 (en) * 2022-06-08 2023-12-14 华为技术有限公司 Memory module and electronic device
WO2024198468A1 (en) * 2023-03-31 2024-10-03 华为技术有限公司 Memory error correction method, memory module, memory controller, and processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001297594A (en) * 2000-04-06 2001-10-26 Hewlett Packard Co <Hp> Method for providing capability of reprogramming memory redundancy in field
CN102332302A (en) * 2011-07-19 2012-01-25 北京时代全芯科技有限公司 Phase change memory and redundancy replacing method for same
CN103295649A (en) * 2013-04-28 2013-09-11 上海宏力半导体制造有限公司 Method for improving reliability of a nonvolatile memory
CN109328340A (en) * 2017-09-30 2019-02-12 华为技术有限公司 Detection method, device and the server of memory failure
CN111294059A (en) * 2019-12-26 2020-06-16 成都海光集成电路设计有限公司 Encoding method, decoding method, error correction method and related device
CN111312321A (en) * 2020-03-02 2020-06-19 电子科技大学 Memory device and fault repairing method thereof
CN111459712A (en) * 2020-04-16 2020-07-28 上海安路信息科技有限公司 SRAM type FPGA single event upset error correction method and single event upset error correction circuit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001297594A (en) * 2000-04-06 2001-10-26 Hewlett Packard Co <Hp> Method for providing capability of reprogramming memory redundancy in field
CN102332302A (en) * 2011-07-19 2012-01-25 北京时代全芯科技有限公司 Phase change memory and redundancy replacing method for same
CN103295649A (en) * 2013-04-28 2013-09-11 上海宏力半导体制造有限公司 Method for improving reliability of a nonvolatile memory
CN109328340A (en) * 2017-09-30 2019-02-12 华为技术有限公司 Detection method, device and the server of memory failure
CN111294059A (en) * 2019-12-26 2020-06-16 成都海光集成电路设计有限公司 Encoding method, decoding method, error correction method and related device
CN111312321A (en) * 2020-03-02 2020-06-19 电子科技大学 Memory device and fault repairing method thereof
CN111459712A (en) * 2020-04-16 2020-07-28 上海安路信息科技有限公司 SRAM type FPGA single event upset error correction method and single event upset error correction circuit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG XIAOHE等: "In-memory computing to break the memory wall", 《CHINESE PHYSICS B》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114518972A (en) * 2022-02-14 2022-05-20 海光信息技术股份有限公司 Memory error processing method and device, memory controller and processor
WO2023179634A1 (en) * 2022-03-22 2023-09-28 华为技术有限公司 Data writing method and processing system
WO2023202592A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Data writing method and processing system
WO2023236996A1 (en) * 2022-06-08 2023-12-14 华为技术有限公司 Memory module and electronic device
WO2024198468A1 (en) * 2023-03-31 2024-10-03 华为技术有限公司 Memory error correction method, memory module, memory controller, and processor

Also Published As

Publication number Publication date
CN112579342B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN112579342B (en) Memory error correction method, memory controller and electronic equipment
US10824499B2 (en) Memory system architectures using a separate system control path or channel for processing error information
US10002043B2 (en) Memory devices and modules
US8341499B2 (en) System and method for error detection in a redundant memory system
US7428689B2 (en) Data memory system and method for transferring data into a data memory
US7587658B1 (en) ECC encoding for uncorrectable errors
US8103900B2 (en) Implementing enhanced memory reliability using memory scrub operations
KR102378466B1 (en) Memory devices and modules
US8566672B2 (en) Selective checkbit modification for error correction
US9262284B2 (en) Single channel memory mirror
US9785570B2 (en) Memory devices and modules
US11409601B1 (en) Memory device protection
US12032443B2 (en) Shadow DRAM with CRC+RAID architecture, system and method for high RAS feature in a CXL drive
CN115729746A (en) Data storage protection method based on CRC and ECC
CN111142797B (en) Solid state disk refreshing method and device and solid state disk
US20240004791A1 (en) Controller cache architeture
CN114153402B (en) Memory and data reading and writing method thereof
CN111383701B (en) Redundancy error correction structure of OTP
CN109753239B (en) Semiconductor memory module, semiconductor memory system, and method of accessing the same
US20240086090A1 (en) Memory channel disablement
US20240096439A1 (en) Selective per die dram ppr for cxl type 3 device
CN113495674B (en) Read-write method and memory device
WO2023208228A1 (en) Storage device and data processing method
CN113495677B (en) Read-write method and memory device
US20230386598A1 (en) Methods for real-time repairing of memory failures caused during operations, memory systems performing repairing methods, and data processing systems including repairing memory systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant