CN112181703B - CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method - Google Patents
CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method Download PDFInfo
- Publication number
- CN112181703B CN112181703B CN202011043590.8A CN202011043590A CN112181703B CN 112181703 B CN112181703 B CN 112181703B CN 202011043590 A CN202011043590 A CN 202011043590A CN 112181703 B CN112181703 B CN 112181703B
- Authority
- CN
- China
- Prior art keywords
- read
- write
- record
- modify
- request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
The invention discloses a CAM supporting a soft error retransmission mechanism between a capacity processor and a memory board and an application method, wherein the CAM comprises a recording storage unit, and n recorded contents comprise effective identification, read address, read-modify-write related identification and retransmission time count; the comparator array comprises n comparators, each comparator comprises two paths of input, one path of input is a write address, the other path of input is a read address in a corresponding record, and a control signal of the comparator is an effective identifier in the corresponding record; and the read-modify-write correlation judgment module is used for judging whether the read-modify-write correlation exists according to the output of the n comparators. The CAM design supports a read request retransmission mechanism based on the correlation CAM, can accommodate the inter-board soft error between the main processor and the off-chip memory with the maximum probability under the condition of not influencing the system performance as much as possible, has the advantages of simple structure, convenient implementation and no blockage, and has both high performance and high efficiency for accommodating the inter-board soft error.
Description
Technical Field
The invention relates to a communication fault-tolerant technology between a processor and a memory, in particular to a CAM (computer-aided manufacturing) supporting a soft error retransmission mechanism between the processor and the memory and an application method.
Background
In the mainstream design of the current high-performance processor, the processor and the off-chip memory are communicated through a mainboard, and the most common mode is that the processor sends out memory access read-write commands and write data containing memory addresses according to a DDR protocol and transmits the memory access read-write commands and the write data to the off-chip content through the mainboard; the off-chip memory follows DDR protocol and returns read response data to the processor through the mainboard. In the DDR protocol, techniques such as ODT (On-die termination) are used to solve crosstalk and reflection between signals, but as the process advances, the reduced power supply voltage and high clock frequency exacerbate the effects of noise sources, such as particle impact and crosstalk, which can cause transient ERRORs, i.e., SOFT ERRORs (SOFT ERROR), in transmitted data. When the system works in a severe working environment, the probability of the occurrence of the soft errors in the communication between the boards is greatly increased.
In order to accommodate soft errors between the processor and the memory board, a read request retransmission mechanism for accommodating soft errors needs to be designed. However, one of the most important design difficulties in the read request retransmission mechanism is how to retransmit the request without violating the correlation between the read and write requests. Specifically, the content data stored in the target address of the read request to be retransmitted is not modified by the write request between the initial read and the first retransmission read, or the data of the target address between the current retransmission request and the previous retransmission request is not modified. For each memory access request sent to the memory, if the target address of the write request is the same as that of a certain inflight (contention) read request, the content of the target address is modified after the write request is sent out. Thus, if the read request has returned data errors due to inter-board soft errors, but the read request cannot be retransmitted from a correctness perspective, otherwise the read-back data is modified data rather than the target data. Therefore, the write request may cause some inter-board soft errors to be non-fault tolerant. Therefore, how to retransmit the request without violating the correlation between the read and write requests, and further improve the reliability of implementing the soft error tolerance between the boards, has become a key technical problem to be solved urgently.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the CAM design supports a read request retransmission mechanism based on a correlation CAM, can contain the interplate soft errors between a main processor and an off-chip memory with the maximum probability under the condition of not influencing the system performance as much as possible, has the advantages of simple structure, convenient implementation and no blockage, and has both performance and high efficiency.
In order to solve the technical problems, the invention adopts the technical scheme that:
a CAM for supporting a soft error retransmission mechanism between a processor and a memory board, comprising:
the record storage unit comprises n records, and the content of each record comprises an effective identifier, a read address, a read-modify-write related identifier and a retransmission time count;
the comparator array comprises n comparators which are in one-to-one correspondence with the records in the record storage unit, each comparator comprises two paths of inputs, one path of input is a write address, the other path of input is a read address in the corresponding record, and a control signal of the comparator is an effective mark in the corresponding record;
and the reading, modifying and writing related judging module is used for judging whether the output of the n comparators is related to reading, modifying and writing.
Optionally, the record storage unit contains 64 records in total.
Optionally, a bit width of the valid flag and the read-modify-write correlation flag is 1.
Optionally, the bit width of the retransmission number count is 8.
In addition, the present invention also provides an application method of the CAM supporting the soft error retransmission mechanism between the capacity processor and the memory board, which comprises the following steps:
1) Waiting for the access request, and if the received access request is a read request, executing the step 2); if the received access request is a write request, executing step 3); if the received access request is read response data, executing step 4);
2) Creating a new record in a record storage unit, filling an access address of a read request into a read address of the new record, setting an effective identifier of the new record to be 1, a read-modify-write related identifier to be 0 and a retransmission time count to be 0, and executing the step 1);
3) Aiming at the write address of the write request, judging whether the write request is related to the read-modify-write of a record with a valid identifier of 1 in the record storage unit by using a comparator array and a read-modify-write related judgment module, if so, setting the read-modify-write related identifier of the corresponding record to 1, and executing the step 1);
4) Checking the received read response data, if the read response data is checked to have no uncorrectable errors, returning the correct read response data, setting the effective identifier of the record corresponding to the read address of the read response data to 0, and executing the step 1); otherwise, skipping to execute the step 5);
5) Aiming at the read request record corresponding to the read response data, judging whether the read-modify-write related identifier is 0 and the retransmission time count is smaller than a preset threshold value, if so, sending a retransmission request, adding 1 to the retransmission time count of the record corresponding to the read address of the read response data, and executing the step 1); otherwise, end and exit.
Compared with the prior art, the invention has the following advantages: the CAM design supports a read request retransmission mechanism based on the correlation CAM, can accommodate the inter-board soft error between the main processor and the off-chip memory with the maximum probability under the condition of not influencing the system performance as much as possible, has the advantages of simple structure, convenient implementation and no blockage, and has both high performance and high efficiency for accommodating the inter-board soft error.
Drawings
FIG. 1 is a schematic diagram of a CAM according to an embodiment of the present invention.
Detailed Description
One of the most important design difficulties in the read request retransmission mechanism is how to retransmit the request without violating the correlation between the read and write requests. Specifically, the content data stored in the target address of the read request to be retransmitted is not modified by the write request between the initial read and the first retransmission read, or the data of the target address between the current retransmission request and the previous retransmission request is not modified. In order to ensure the correctness of the retransmitted read request data, the embodiment designs a CAM supporting the soft error retransmission mechanism between the processor and the memory board for determining the correlation.
As shown in fig. 1, a CAM (content-Addressable Memory) supporting a soft-error retransmission mechanism between a processor and a Memory board of the present embodiment includes:
the recording storage unit comprises n records, wherein the content of each record comprises a valid identifier (marked as V in FIG. 1), a read address (marked as Addreess in FIG. 1), a read-modify-write related identifier (marked as WAR in FIG. 1) and a retransmission time count (marked as Retry Cnt in FIG. 1);
the comparator array comprises n comparators which are in one-to-one correspondence with the records in the record storage unit, each comparator comprises two paths of inputs, one path of input is a write address, the other path of input is a read address in the corresponding record, and a control signal of the comparator is an effective mark in the corresponding record;
and the read-modify-write correlation judgment module is used for judging whether the read-modify-write correlation exists according to the output of the n comparators.
As an optional implementation manner, in this embodiment, the record storage unit includes 64 records in total, so that at most 64 records of read requests can be supported.
As an optional implementation manner, in this embodiment, the bit width of the valid flag and the read-modify-write related flag is 1.
As an optional implementation manner, the bit width of the retransmission number count in this embodiment is 8.
In this embodiment, the content of each record is shown in table 1:
table 1: and recording the recording content table of the storage unit.
Content providing method and apparatus | Use of |
Valid identification | And the effective identification is used for storing the record, and the information stored in the record is effective. |
Reading address | And the read request address is used for storing the read request, and the content is an associated object among the read address, the write request and the read response data. |
Read-modify-write correlation identification | For storing whether or not read-modify-write is relevant, i.e.: this address has its contents modified by other write requests before returning the read response data, and the write request has been sent to off-chip memory. |
Retransmission count | For storing the number of retransmissions. |
In order to accommodate the soft error between the processor and the memory, in this embodiment, a set of soft error-tolerant read request retransmission mechanism (a retransmission mechanism supporting write non-blocking) is further designed, which checks the received read response data (usually, ECC checking is used to check for one and two), and if there is an "uncorrectable" error in the received read data, tries to retransmit the read request until no error data is received or the maximum upper limit of attempts is reached. Because the soft error between the boards has the characteristic of instantaneity, the read request retransmission mechanism can accommodate the soft error between the boards with great probability. The read request retransmission mechanism is an application method of the CAM supporting the soft error retransmission mechanism between the capacity processor and the memory board, and comprises the following steps:
1) Waiting for the access request, and if the received access request is a read request, executing the step 2); if the received access request is a write request, executing step 3); if the received access request is read response data, executing the step 4);
2) Creating a new record in a record storage unit, filling an access address of a read request into a read address of the new record, setting an effective identifier of the new record to be 1, a read-modify-write related identifier to be 0 and a retransmission time count to be 0, and executing the step 1);
3) Aiming at the write address of the write request, judging whether the write request is related to the read-modify-write of a record with a valid identifier of 1 in the record storage unit by using a comparator array and a read-modify-write related judgment module, if so, setting the read-modify-write related identifier of the corresponding record to 1, and executing the step 1);
4) Checking the received read response data, if the read response data is checked to have no uncorrectable errors, returning the correct read response data, setting the effective identifier of the record corresponding to the read address of the read response data to 0, and executing the step 1); otherwise, skipping to execute the step 5);
5) Aiming at the read request record corresponding to the read response data, judging whether the read-modify-write related identification is 0 and the retransmission time count is smaller than a preset threshold value, if so, sending a retransmission request, adding 1 to the retransmission time count of the record corresponding to the read address of the read response data, and executing the step 1); otherwise, end and exit.
Combining the above steps 1) to 5), it can be seen that the update and timing of the CAM information are shown in table 2.
Table 2: updating and timing table of CAM information.
Content providing method and apparatus | Set 1/update opportunity | Pre-stack opportunity with 0 set | |
Valid identification | Read request reception | Read response return stack | |
Reading address | Read request reception | Is free of | |
Read-modify-write correlation identification | Write request&Correlation | Read request receiving stack | Retransmission decision stack |
Retransmission count | Decision +1 after retransmission | Read request receiving stack | Retransmission decision stack |
The determination condition of the retransmission in step 5) is that the read-modify-write correlation flag is 0, and the retransmission count is smaller than the preset threshold, which can be expressed as follows in combination with table 1: WAR and Retry Cnt do not exceed the standard ": the retransmission is performed, and specific examples thereof are shown in table 3.
Table 3: and (5) a retransmission decision table.
Read-modify-write correlation identification | Retry Cnt superscalar | The result of the judgment | |
Case 1 | 1 | / | Without retransmission |
Case 2 | 0 | 1 | Not to retransmit |
Case 3 | 0 | 0 | Retransmission |
The CAM application method supporting the inter-board soft error retransmission mechanism of the capacity processor and the memory in the embodiment is based on the read request retransmission mechanism of the correlation CAM, and can accommodate the inter-board soft error between the main processor and the off-chip memory with the maximum probability under the condition of not influencing the system performance as much as possible.
In terms of performance, the method for applying the CAM supporting the soft error retransmission mechanism between the processor and the memory board in this embodiment does not block the write request of the request source, so that the fault-tolerant mechanism has little influence on the system performance. However, in this mechanism, the read request cannot be retransmitted due to the write to the same target address between the first request and the retransmitted request or between two adjacent retransmitted read requests, and thus the inter-board soft error cannot be tolerated. There are very few write-after-read related requests between adjacent requests. Then, in this mechanism, soft errors between boards cannot be accommodated due to dependencies, however, in modern high-performance processors, there are typically multiple levels of cache such as L1, L2, and even L3, so as to exploit and utilize the temporal locality, i.e., the dependencies, between data to the maximum possible. Therefore, the probability that the read request cannot be retransmitted due to the write of the same target address in the memory access request sent to the memory is also low. In summary, the CAM application method supporting the soft error retransmission mechanism between the processor and the memory board according to the present embodiment can achieve both high performance and high efficiency of soft errors between the memory boards.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
Claims (4)
1. A CAM for supporting a soft error retransmission mechanism between a processor and a memory board, comprising:
the record storage unit comprises n records, and the content of each record comprises an effective identifier, a read address, a read-modify-write related identifier and a retransmission time count;
the comparator array comprises n comparators which are in one-to-one correspondence with the records in the record storage unit, each comparator comprises two paths of inputs, one path of input is a write address, the other path of input is a read address in the corresponding record, and a control signal of the comparator is an effective mark in the corresponding record;
the read-modify-write correlation judgment module is used for judging whether the output of the n comparators is read-modify-write correlation;
the method for applying the CAM supporting the soft error retransmission mechanism between the processor and the memory board comprises the following steps:
1) Waiting for the access request, and if the received access request is a read request, executing the step 2); if the received access request is a write request, executing step 3); if the received access request is read response data, executing the step 4);
2) Creating a new record in a record storage unit, filling an access address of a read request into a read address of the new record, setting an effective identifier of the new record to be 1, a read-modify-write related identifier to be 0 and a retransmission time count to be 0, and executing the step 1);
3) Aiming at the write address of the write request, judging whether the write request is related to the read-modify-write of a record with a valid identifier of 1 in the record storage unit by using a comparator array and a read-modify-write related judgment module, if so, setting the read-modify-write related identifier of the corresponding record to 1, and executing the step 1);
4) Verifying the received read response data, if the read response data does not have uncorrectable errors through verification, returning the correct read response data, setting the effective identifier of the record corresponding to the read address of the read response data to be 0, and executing the step 1); otherwise, skipping to execute the step 5);
5) Aiming at the read request record corresponding to the read response data, judging whether the read-modify-write related identifier is 0 and the retransmission time count is smaller than a preset threshold value, if so, sending a retransmission request, adding 1 to the retransmission time count of the record corresponding to the read address of the read response data, and executing the step 1); otherwise, end and exit.
2. The CAM of claim 1, wherein the record storage unit comprises 64 records.
3. The CAM for supporting a soft error retransmission mechanism between a processor and a memory board according to claim 1, wherein the bit width of the valid flag, the read-modify-write correlation flag is 1.
4. The CAM for supporting a soft error retransmission mechanism between a processor and a memory board according to claim 1, wherein the bit width of the retransmission count is 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011043590.8A CN112181703B (en) | 2020-09-28 | 2020-09-28 | CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011043590.8A CN112181703B (en) | 2020-09-28 | 2020-09-28 | CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112181703A CN112181703A (en) | 2021-01-05 |
CN112181703B true CN112181703B (en) | 2022-10-28 |
Family
ID=73946331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011043590.8A Active CN112181703B (en) | 2020-09-28 | 2020-09-28 | CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112181703B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113806108A (en) * | 2021-08-25 | 2021-12-17 | 海光信息技术股份有限公司 | Retransmission method, memory controller, processor system and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0707268A2 (en) * | 1994-10-14 | 1996-04-17 | Compaq Computer Corporation | Easily programmable memory controller which can access different speed memory devices on different cycles |
US5634112A (en) * | 1994-10-14 | 1997-05-27 | Compaq Computer Corporation | Memory controller having precharge prediction based on processor and PCI bus cycles |
CN1571951A (en) * | 2001-08-23 | 2005-01-26 | 集成装置技术公司 | FIFO memory devices having single data rate (sdr) and dual data rate (ddr) capability |
CN102063340A (en) * | 2011-01-19 | 2011-05-18 | 西安交通大学 | Method for improving fault-tolerant capability of high-speed cache of magnetoresistance RAM (Random Access Memory) |
CN102103558A (en) * | 2009-12-18 | 2011-06-22 | 上海华虹集成电路有限责任公司 | Multi-channel NANDflash controller with write retransmission function |
CN105740168A (en) * | 2016-01-23 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Fault-tolerant directory cache controller |
CN107888512A (en) * | 2017-10-20 | 2018-04-06 | 深圳市楠菲微电子有限公司 | Dynamic shared buffer memory and interchanger |
CN110727530A (en) * | 2019-09-12 | 2020-01-24 | 无锡江南计算技术研究所 | Error access memory request retransmission system and method based on window |
-
2020
- 2020-09-28 CN CN202011043590.8A patent/CN112181703B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0707268A2 (en) * | 1994-10-14 | 1996-04-17 | Compaq Computer Corporation | Easily programmable memory controller which can access different speed memory devices on different cycles |
US5634112A (en) * | 1994-10-14 | 1997-05-27 | Compaq Computer Corporation | Memory controller having precharge prediction based on processor and PCI bus cycles |
CN1571951A (en) * | 2001-08-23 | 2005-01-26 | 集成装置技术公司 | FIFO memory devices having single data rate (sdr) and dual data rate (ddr) capability |
CN102103558A (en) * | 2009-12-18 | 2011-06-22 | 上海华虹集成电路有限责任公司 | Multi-channel NANDflash controller with write retransmission function |
CN102063340A (en) * | 2011-01-19 | 2011-05-18 | 西安交通大学 | Method for improving fault-tolerant capability of high-speed cache of magnetoresistance RAM (Random Access Memory) |
CN105740168A (en) * | 2016-01-23 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Fault-tolerant directory cache controller |
CN107888512A (en) * | 2017-10-20 | 2018-04-06 | 深圳市楠菲微电子有限公司 | Dynamic shared buffer memory and interchanger |
CN110727530A (en) * | 2019-09-12 | 2020-01-24 | 无锡江南计算技术研究所 | Error access memory request retransmission system and method based on window |
Also Published As
Publication number | Publication date |
---|---|
CN112181703A (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11714750B2 (en) | Data storage method and system with persistent memory and non-volatile memory | |
US8898408B2 (en) | Memory controller-independent memory mirroring | |
US7937641B2 (en) | Memory modules with error detection and correction | |
US8694857B2 (en) | Systems and methods for error detection and correction in a memory module which includes a memory buffer | |
US7770077B2 (en) | Using cache that is embedded in a memory hub to replace failed memory cells in a memory subsystem | |
US8103900B2 (en) | Implementing enhanced memory reliability using memory scrub operations | |
US9454422B2 (en) | Error feedback and logging with memory on-chip error checking and correcting (ECC) | |
US7587658B1 (en) | ECC encoding for uncorrectable errors | |
US9063902B2 (en) | Implementing enhanced hardware assisted DRAM repair using a data register for DRAM repair selectively provided in a DRAM module | |
US7676729B2 (en) | Data corruption avoidance in DRAM chip sparing | |
CN109213693B (en) | Storage management method, storage system and computer program product | |
US11609817B2 (en) | Low latency availability in degraded redundant array of independent memory | |
CN112181703B (en) | CAM supporting soft error retransmission mechanism between capacity processor and memory board and application method | |
US11520659B2 (en) | Refresh-hiding memory system staggered refresh | |
WO2023024594A1 (en) | Retransmission method, memory controller, processor system and electronic device | |
US8352786B2 (en) | Compressed replay buffer | |
CN112181871B (en) | Write-blocking communication control method, component, device and medium between processor and memory | |
US9251054B2 (en) | Implementing enhanced reliability of systems utilizing dual port DRAM | |
US20150154073A1 (en) | Storage control apparatus and storage control method | |
CN117080779B (en) | Memory bar plugging device, method for adapting memory controller to memory bar plugging device and working method | |
US10312943B2 (en) | Error correction code in memory | |
US20240338124A1 (en) | Storage device using host memory buffer and method of operating the same | |
CN110597656B (en) | Check list error processing method of secondary cache tag array | |
CN115705906A (en) | Data storage device with data verification circuitry | |
GB2454597A (en) | Packet receiving buffer where packet sub-blocks are stored as linked list with sequence numbers and start/end flags to detect read out errors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |