CN117992273A - Data processing method, device, electronic equipment and storage medium - Google Patents

Data processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117992273A
CN117992273A CN202410137795.4A CN202410137795A CN117992273A CN 117992273 A CN117992273 A CN 117992273A CN 202410137795 A CN202410137795 A CN 202410137795A CN 117992273 A CN117992273 A CN 117992273A
Authority
CN
China
Prior art keywords
data
reading
read
determining
storage space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410137795.4A
Other languages
Chinese (zh)
Inventor
汪永鹏
谭越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202410137795.4A priority Critical patent/CN117992273A/en
Publication of CN117992273A publication Critical patent/CN117992273A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1012Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
    • G06F11/1016Error in accessing a memory location, i.e. addressing error

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment and a storage medium. The data processing method comprises the following steps: determining a first parameter in response to reading the first data from the first storage space under the first mechanism; the first data does not include check bits, and the first parameter characterizes the time when the first data is read; and if the first parameter meets the first condition, determining that the data stored in the first storage space is over-checked in the reading process.

Description

Data processing method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of memory failure detection, and in particular, to a data processing method, apparatus, device, and readable medium.
Background
A single bit failure in memory is a failure that occurs in memory, i.e., a failure of one memory cell. Faults may also occur in the data link between the memory and the CPU transmitting data. To solve this problem, one of the more common schemes in memory is error correction code (Error Correction Code, ECC) mechanism memory. The ECC mechanism is used to check and correct memory errors, which is critical to data security and memory reliability.
Disclosure of Invention
To solve the above problems, in a first aspect, an embodiment of the present disclosure provides a data processing method, including: determining a first parameter in response to reading the first data from the first storage space under the first mechanism; the first data does not include a check bit, and the first parameter characterizes a time when the first data is read; and if the first parameter meets a first condition, determining that the first data stored in the first storage space is subjected to error correction in the reading process.
According to an embodiment of the present disclosure, there is provided a data processing method, further including: if the first data is the data of one reading unit of the data to be read, detecting that the time for reading the first data is longer than the first time, and determining that the first parameter meets a first condition.
According to an embodiment of the present disclosure, there is provided a data processing method, further including: if the first data are to-be-read data, detecting that the target condition exists in the reading time of each reading unit, and determining that the first parameter meets a first condition; the data to be read is read according to the reading unit.
According to an embodiment of the present disclosure, detecting that a target condition exists at a reading time of each reading unit, determining that the first parameter satisfies a first condition includes: and detecting that the difference value of the reading time of two adjacent reading units is larger than the second time, and determining that the first parameter meets a first condition.
According to an embodiment of the present disclosure, the adjacent two reading units include a first reading unit and a second reading unit; the first reading unit corresponds to the data of the M+1 to N+1 bit addresses in the first storage space; the determining that the first data stored in the first storage space is error-corrected in the reading process includes: and if the reading time of the second reading unit is longer than that of the first reading unit, determining that the data of the (N+1) th bit address in the first storage space is subjected to error correction.
According to an embodiment of the present disclosure, there is provided a data processing method, further including: writing first identification data and second identification data into the first storage space respectively; responding to the first identification data read from the first storage space in a traversing way under a first mechanism, and determining a first set of data bits subjected to error correction according to the reading time of each reading unit; responding to the first identification data read from the first storage space in a traversing way under the first mechanism, and determining the data bits of the second set, which are subjected to error correction, according to the reading time of each reading unit; when the first storage space is read, if the first set of data bits and/or the second set of data bits are read, determining that the data stored in the reading process is subjected to error correction.
According to an embodiment of the present disclosure, there is provided a data processing method, further including: determining a second parameter in response to reading the second data from the second storage space under the second mechanism; the second data includes a check bit, the second parameter characterizing a check result for the second data; determining a faulty unit based on the second parameter; the failure unit is the second storage space or a data link for reading the second data.
According to an embodiment of the present disclosure, determining a faulty unit based on the second parameter includes: if the second data is the data to be read, determining checked data bits in the second data based on the second parameter; the data to be read is read according to a reading unit; if different checked data bits are transmitted through the same data link, determining the data link as a fault unit; and if different checked data bits are transmitted through different data links, determining the second storage space as a fault unit.
Another aspect of the present disclosure provides a data processing apparatus, comprising: a reading module for determining a first parameter in response to reading the first data from the first storage space under a first mechanism; the first data does not include a check bit, and the first parameter characterizes a time when the first data is read; and the determining module is used for judging whether the first parameter meets a first condition, and if so, determining that the first data stored in the first storage space in the reading process is subjected to error correction.
Another aspect of the present disclosure also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a data processing method as above.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario 100 of a data processing method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of on-chip ECC mechanism ECC error detection and correction, according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a data processing method according to another embodiment of the present disclosure
FIG. 5 schematically illustrates a flow chart of a data processing method according to yet another embodiment of the present disclosure;
FIG. 6 schematically illustrates a schematic diagram of ECC error detection and correction under a sideband ECC mechanism in accordance with an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and
Fig. 8 shows a schematic block diagram of an example electronic device 700 that may be used to implement the methods of embodiments of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Fig. 1 schematically illustrates an application scenario 100 of a data processing method according to an embodiment of the present disclosure. It should be noted that fig. 1 is merely an example of a scenario in which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, the implementation purpose of the application scenario 100 is: locating a failed memory cell on a memory chip and determining whether it is a single bit failure due to a link problem or a failure due to a memory cell.
The memory chip 110 is communicatively or electrically coupled to a controller 120 (CPU). The memory chip 110 is allocated with a plurality of memory cells 111, and the controller 120 may write actual data into the memory cells 111 or may read actual data from the memory cells 111. In the present application scenario 100, the memory chip 110 may be a double-rate synchronous dynamic random access memory (DDR), such as DDR3, DDR4, DDR5, or a low power consumption double data rate memory chip (LPDDR), such as LPDDR3, LPDDR4, LPDDR5, or the like.
If the memory cell 111 fails, a single bit error occurs when reading and writing data of the address bit, and ECC is used to detect and correct the single bit error occurring during the reading and writing of the memory cell 111.
Currently, ECC mechanisms are generally divided into two categories, namely on-chip ECC (On Die ECC) and Side-band ECC (Side-band ECC). Under the On die ECC mechanism, the memory transmits corrected data to the CPU, and the CPU is only responsible for receiving the data and has no data checking capability, so that whether the received data is corrected or not cannot be known, namely under the On die ECC mechanism, the CPU cannot sense the existence of faults, and the reliability and the read-write performance of the memory are negatively influenced. Under the Side-band ECC mechanism, the CPU has data checking capability, receives data with check bits, performs check and error correction, and can know which of the received data bits are error corrected, namely under the Side-band ECC mechanism, the CPU can sense the existence of faults, but cannot sense whether the data are in error in a transmission channel or in an internal memory.
Under the On die ECC mechanism, the memory chip 110 is an On-chip ECC memory chip, and the single bit error in the memory read/write process, which occurs due to the failure of the storage unit 111, is detected and corrected internally: because of the memory prefetch mechanism, n addresses after the address are prefetched in each read operation, and the data bit width n corresponding to each address obtains 128bit data; the 128bit data generates an 8bit ECC check code, and if 1 bit in the 128bit data is abnormal due to the fault of a storage unit, the On die ECC function can correct the abnormal bit through the 8bit ECC check code (Error Correction). For another example, in a sideband ECC mechanism, the controller 120 may utilize additional ECC check particles to complete the checking and correction of the received data.
Under the On die ECC mechanism, all the operations of On die ECC are automatically completed in the memory chip 110 through hardware circuits, and the controller 120 cannot know whether the acquired data is corrected. If the memory cell 111 is normal, the ECC function on the memory chip 110 does not need Error Correction when the memory is read, so that the delay of the read operation is consistent and normal. If the memory cell 111 fails, the ECC function on the memory chip 110 needs to perform additional Error Correction during the read operation of the memory, which results in a larger delay of the read operation. Based on this, the controller 120 may count the delay time of the continuous read operation of the memory chip 110 in order to know whether the acquired data is corrected and the corrected address bits, and determine whether error correction occurs and the position of the error correction according to the delay time.
Under the Side-band ECC mechanism, the memory chip 110 is a sideband ECC memory chip, and when the controller 120 receives that the data has errors and passes ECC check and error correction, the checked data is recorded in place (DQx), and then whether the fault location is in the data link or the storage unit 111 of the memory chip 110 can be determined according to the DQx.
It should be understood that the number and type of memory chips 110, controllers 120 in fig. 1 are merely illustrative. There may be any number and type of memory chips 110, controllers 120, as desired for implementation.
Fig. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the data processing method 200 includes operations S210 to S220.
In operation S210, a first parameter is determined in response to reading first data from a first storage space under a first mechanism.
In embodiments of the present disclosure, the first mechanism may refer to an On Die ECC mechanism, i.e., an error check correction mechanism for an On-chip ECC memory bank. The first storage space may be a plurality of storage units 111 in an on-chip ECC memory chip.
Fig. 3 schematically illustrates a schematic diagram of on-chip ECC mechanism ECC error detection and correction in accordance with an embodiment of the present disclosure.
As shown in fig. 3, for the On-chip ECC memory chip, the ECC mechanism is run On the memory granule chip, the On die ECC operation is automatically completed in the memory chip through a hardware circuit, and the CPU has no verification capability On the On-chip ECC memory and can only receive data. For example, the CPU acquires 8-bit data from Die0 to Die7, respectively, the 8-bit data including only actual data stored in the ECC memory and not including ECC check bits, and acquires 64-bit data in total. The plurality of storage units 111 corresponding to the first storage space may be used to store data to be read that does not include check bits, that is, the first data acquired by the CPU does not include check bits. The first parameter characterizes the time to read the first data, i.e. the time required from the start of the read time to the successful read time, e.g. 0.001ms, 0.002ms, 0.003ms, etc. It should be understood that the number of Die in fig. 3 is merely illustrative and is not intended to limit the present disclosure.
In operation S220, if the first parameter satisfies the first condition, it is determined that the first data stored in the first storage space during the reading process is over-corrected.
According to the embodiment of the disclosure, whether the data stored in the memory chip is subjected to error correction is judged according to the time when the first data is read, and although the read first data does not contain check bits and a CPU (Central processing Unit) does not have check capability, the CPU can know whether the memory chip is internally subjected to additional error correction or not based On the first time to cause the delay of the read data to be increased, so that whether the memory chip is subjected to error correction or not is determined, and the problem that a controller end cannot know whether the acquired data is corrected or not under an On die ECC mechanism is solved.
Further, in an embodiment of the disclosure, if the first data is data of one reading unit of the data to be read, it is detected that the time to read the first data is greater than the first time, and it is determined that the first parameter satisfies the first condition.
In some application scenarios, the first data acquired by the CPU is the data of one reading unit of the data to be read, and the time for acquiring the data of one reading unit at a time may be constant, for example, all 0.001ms. The first time may be a read time of data of one read unit in case that the memory unit is normal. In this scenario, the first parameter satisfying the first condition may be understood as having a time to read the first data from the first storage space being longer than the first time. For example, each reading unit may read 8 bits of data, the reading time is 0.001ms, and the first time is set to 0.001ms. If the time from the reading of the first data in the first storage space is 0.0015ms, and 0.002ms is greater than 0.0015ms, the first parameter is indicated to meet the first condition, and at this time, it is determined that the first data stored in the first storage space is over-corrected during the reading process.
By the embodiment of the disclosure, for the case that the first data is the data of one reading unit of the data to be read, whether the memory chip is subjected to error correction is directly determined according to the reading time of the data of the single reading unit, so that whether the data of the single reading unit is subjected to error correction is determined.
Further, in another embodiment of the present disclosure, if the first data is data to be read, it is detected that there is a target condition for the reading time of each reading unit, and it is determined that the first parameter satisfies the first condition.
The data to be read is read in accordance with the read unit. The manner of data reading may include the following two:
Mode one: for example, the data to be read includes 128 bits, corresponding to the address bits 0-127 in the first memory space, each reading unit can read 8 bits of data, the first time reads the data of the 0 th-7 th address bits in the first memory space, the second time reads the data of the 8-15 th address bits, the third time reads the data of the 16-23 address bits, and so on, the last time reads the data of the 121-128 address bits, and the 16 times are read according to the reading unit. In the reading process, corresponding reading times t1, t2, t3, … and t16 are respectively recorded, if the target conditions exist in t1, t2, t3, … and t16, the first parameter is determined to meet the first condition, and if the target conditions do not exist in t1, t2, t3, … and t16, the first parameter is determined to not meet the first condition.
Mode two: for example, the data to be read includes 0-127bit (128 bit), the address bit in the corresponding first memory space is 0-127, each reading unit can read 8bit data, the first time reads the data of the 0-7 th address bit in the first memory space, the second time reads the data of the 1-8 address bit, the third time reads the data of the 3-9 address bit, and so on, the last time reads the data of the 121-128 address bit, and the 122 times are read according to the reading unit. In the reading process, corresponding reading times t1, t2, t3, … and t122 are respectively recorded 122 times, if the target conditions exist in t1, t2, t3, … and t122, the first parameter is determined to meet the first condition, and if the target conditions do not exist in t1, t2, t3, … and t122, the first parameter is determined to not meet the first condition.
For the two cases of the above-described continuous reading, the time of each reading time by the CPU is not necessarily constant, and therefore, it may not be possible to detect whether or not the error correction has occurred in accordance with the time of reading the data by the single reading unit as described above. Based on this, the target condition may be that the difference between the reading times of two adjacent reading units is greater than the second time, that is, the target condition of the reading time of each reading unit is detected, and determining that the first parameter satisfies the first condition may include: and detecting that the reading time difference value of two adjacent reading units is larger than the second time, and determining that the first parameter meets the first condition.
The second time may be a maximum difference allowed between the read times of two adjacent read units in the case where the memory unit is normal, for example, 0.0001ms.
The reading time corresponding to the first reading unit is t1, the reading time corresponding to the second reading unit is t2, and if t2 is greater than t1, the reading time difference between the second reading unit and the first reading unit is t2-t1, for example, t1 is 0.001ms, t2 is 0.001 5ms, the value of t2-t1 is 0.0005ms and greater than 0.0001ms, and the target condition exists in the reading time of each reading unit.
It should be noted that the two data reading methods are the same in that: it can be determined whether the data is over-corrected based on the difference in reading time between two adjacent reading units for both reading modes. The difference is that: the first mode can determine whether the data is over-corrected, and the second mode can determine whether the data is over-corrected, and can also determine specific address bits where the error correction is to be performed, which will be described later.
According to the embodiment of the disclosure, for the case that the first data are all data to be read, whether the memory chip is subjected to error correction is determined according to the difference value between the reading times of the data of two adjacent reading units, so that whether all the data to be read are subjected to error correction is determined.
Further, in the embodiment of the present disclosure, the first reading unit may correspond to data of m+1 to n+1 bit addresses in the first storage space, and the second reading unit may correspond to data of M to N bit addresses in the first storage space, where N is greater than M.
In operation S220, determining that the first data stored in the first storage space during the reading process is error-corrected includes: if the read time of the second read unit is longer than the read time of the first read unit, it is determined that the data of the (n+1) -th bit address in the first memory space is over-corrected.
The data reading mode here is the second data reading mode, for example, the first reading unit may correspond to the data of the 0 th to 7 th bit addresses in the first storage space, the second reading unit may correspond to the data of the 1 st to 8 th bit addresses in the first storage space, the reading time t1 corresponding to the first reading unit is 0.001ms, the reading time t2 corresponding to the second reading unit is 0.0015ms, and if t2 is greater than t1, it is determined that there is a failure unit in the 8 th bit address (the last address bit in the address corresponding to the second reading unit) in the first storage space, and the data of the 8 th bit address is error corrected.
By comparing the magnitudes between the read times of two adjacent read units, the address bits where error correction occurs can be quickly determined by embodiments of the present disclosure.
Fig. 4 schematically shows a flow chart of a data processing method according to another embodiment of the present disclosure.
As shown in fig. 4, the data processing method further includes operations S410 to S440.
In operation S410, first identification data and second identification data are written to the first storage space, respectively.
The first identification data may be, for example, 1 and the second identification data may be, for example, 0. Or the first identification data may be, for example, 0 and the second identification data may be, for example, 1.
In operation S420, the first identification data is read in response to the traversal from the first memory space under the first mechanism, and the first set of data bits for which error correction occurred is determined according to the read time of each read unit.
For example, all 1 s are written into the memory cells in the memory chip 110, the memory chip 110 is traversed according to the memory address bits, the time from each instruction sent by the CPU to the memory chip 110 to the time when the data of the memory chip 110 is received is recorded as the reading time of each reading unit, whether the reading time of the next reading unit in all two adjacent reading units is larger than the reading time of the previous reading unit is checked one by one, if the reading time of the next reading unit is larger than the reading time of the previous reading unit, it is determined that the last address bit corresponding to the next reading unit has a fault unit, the data of the last address bit is error corrected, and all the address bits with error correction constitute the data bit of the first set.
In operation S430, the second identification data is read in response to the traversal from the first storage space under the first mechanism, and the data bits of the second set where error correction occurred are determined according to the read time of each read unit.
For example, writing all 0 s into the memory cells in the memory chip 110, traversing the memory chip 110 according to the memory address bits, recording the time from each instruction sent by the CPU to the memory chip 110 to the time when the data of the memory chip 110 is received as the reading time of each reading unit, checking whether the reading time of the next reading unit in all two adjacent reading units is greater than the reading time of the previous reading unit one by one, if the reading time of the next reading unit is greater than the reading time of the previous reading unit, determining that the last address bit corresponding to the next reading unit has a fault unit, correcting the data of the last address bit, and forming the data bit of the second set by all the address bits with corrected errors.
In operation S440, if the first set of data bits and/or the second set of data bits are read while the first storage space is read, it is determined that the data stored during the reading process is error-corrected.
With the embodiments of the present disclosure, the memory cells may be either always 1 or always 0, or may change randomly due to a failure. Therefore, the detection is performed in two states of writing all 1 and all 0 in the memory block, and then the detection results are fused, so that the address bits with error correction can be more effectively and accurately determined.
Fig. 5 schematically shows a flow chart of a data processing method according to a further embodiment of the present disclosure.
As shown in fig. 5, the data processing method 200 further includes operations S510 to S520.
In operation S510, a second parameter is determined in response to reading the second data from the second storage space under the second mechanism. The second data includes check bits, and the second parameter characterizes a check result for the second data.
In an embodiment of the present disclosure, the second mechanism may refer to a sideband ECC mechanism, and accordingly, the second storage space may be the plurality of storage units 111 in the sideband ECC memory chip.
Fig. 6 schematically illustrates a schematic diagram of ECC error detection and correction under a sideband ECC mechanism in accordance with an embodiment of the present disclosure.
As shown in fig. 6, for example, the CPU may acquire 8-bit data from Die0 to Die7, respectively, if the data bits are 8 bits (corresponding to Die 0), it is necessary to increase 5 bits for ECC error detection and correction, each time the data bits are doubled, the ECC is increased by only one check bit, that is, the ECC bits are 6 bits (corresponding to Die0, die) when the data bits are 16, the ECC bits are 7 bits (corresponding to Die0 to Die 3) when the data bits are 32, the ECC bits are 8 bits (corresponding to Die0 to Die 7) when the data bits are 64, and so on, each time the data bits are doubled, the ECC check bit is increased by only one bit. In short, the ECC is capable of tolerating errors in the memory and correcting errors, so that the system can continue to operate normally without interruption due to errors. It should be understood that the number of Die in fig. 6 is merely illustrative and is not intended to limit the present disclosure.
Under the Side-band ECC mechanism, the CPU has verification capability, can perform verification on the second storage space, and judges whether error correction occurs on the second data stored in the second storage space in the reading process. That is, the first data may include check bits for error correction verification. The check result for the second data guaranteed by the second parameter may include that the second data stored in the second storage space is error corrected or that the second data stored in the second storage space is not error corrected.
In operation S520, determining a faulty unit based on the second parameter; the faulty unit is the second storage space or the data link for reading the second data.
In this embodiment of the present disclosure, if the second parameter represents that the second data stored in the second storage space is error corrected as a result of the verification on the second data, the failure unit may be determined according to the error corrected information, and it is determined whether the failure location is in the data link or the memory storage unit.
It should be understood that if the verification result for the second data, which is characterized by the second parameter, is that the second data stored in the second storage space is not error corrected, it indicates that the second storage space is not faulty, so that the operation S520 need not be performed.
Further, determining the fault unit based on the second parameter in operation S520 may specifically include:
And if the second data is the data to be read, determining the checked data bit in the second data based on the second parameter, wherein the data to be read is read according to the reading unit.
If different checked data bits are transmitted through the same data link, the data link is determined to be a faulty unit.
If different checked data bits are transmitted through different data links, the second storage space is determined to be a faulty unit.
For example, when the CPU receives that the data has an error and passes the ECC checksum error correction, the location (DQx) of the data is recorded, and if the data is stably reported in error and multiple errors all occur in the same DQx, it can be determined that the link DQx between the CPU and the memory has a signal problem, that is, the data link is a faulty unit. If the fault is stably reported and the faults are distributed on different DQx, the fault can be judged to occur in the memory storage unit instead of the problem of a certain data link, namely the second storage space is a fault unit.
According to the embodiment of the disclosure, aiming at the Side-band ECC mechanism, the CPU judges whether the fault appears in the data link or the storage unit according to the address bit corrected by the fault unit received for many times, and can determine the specific position causing the single-bit fault, so that the problem that in the traditional technology, although the CPU can sense and utilize extra ECC check particles to finish the inspection and correction of the received data, whether the fault is caused by the single-bit fault caused by the link problem or the fault caused by the memory storage unit cannot be judged.
Fig. 7 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the data processing apparatus 700 includes a reading module 710 and a determining module 720.
According to some embodiments of the present disclosure, the data processing apparatus 700 may be used to implement the data processing method according to embodiments of the present disclosure described with reference to fig. 1 to 4.
The acquisition module 710 may perform, for example, operation S210 for determining a first parameter in response to reading first data from the first storage space under the first mechanism. The first data does not include a check bit and the first parameter characterizes a time when the first data was read.
The determining module 720 may perform, for example, operation S220, to determine whether the first parameter satisfies the first condition, and if yes, determine that the first data stored in the first storage space has been subjected to error correction during the reading process.
According to further embodiments of the present disclosure, the data processing apparatus 700 may also be used to implement the data processing method according to embodiments of the present disclosure described with reference to fig. 5-6.
The acquisition module 710 may also perform, for example, operation S510 for determining a second parameter in response to reading second data from the second storage space under the second mechanism. The second data includes check bits, and the second parameter characterizes a check result for the second data.
The determining module 720 may also perform, for example, operation S520 for determining a faulty unit based on the second parameter; the faulty unit is the second storage space or the data link for reading the second data.
Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Or one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any number of the reading module 710 and the determining module 720 may be combined in one module/unit/sub-unit or any one of them may be split into a plurality of modules/units/sub-units. Or at least some of the functionality of one or more of these modules/units/sub-units may be combined with at least some of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to embodiments of the present disclosure, at least one of the reading module 710 and the determining module 720 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the reading module 710 and the determining module 720 may be at least partly implemented as a computer program module which, when run, may perform the respective functions.
It should be noted that, in the embodiment of the present disclosure, the data processing apparatus portion corresponds to the data processing method portion in the embodiment of the present disclosure, and specific implementation details and technical effects thereof are the same, and detailed descriptions thereof are omitted herein with reference to the data processing embodiment portions shown in fig. 1-5.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement methods of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (11)

1. A data processing method, comprising:
Determining a first parameter in response to reading the first data from the first storage space under the first mechanism; the first data does not include a check bit, and the first parameter characterizes a time when the first data is read;
and if the first parameter meets a first condition, determining that the first data stored in the first storage space is subjected to error correction in the reading process.
2. The method of claim 1, further comprising:
If the first data is the data of one reading unit of the data to be read, detecting that the time for reading the first data is longer than the first time, and determining that the first parameter meets a first condition.
3. The method of claim 1, further comprising:
If the first data are to-be-read data, detecting that the target condition exists in the reading time of each reading unit, and determining that the first parameter meets a first condition; the data to be read is read according to the reading unit.
4. A method according to claim 3, detecting that there is a target condition for the reading time of each reading unit, determining that the first parameter meets a first condition, comprising:
And detecting that the difference value of the reading time of two adjacent reading units is larger than the second time, and determining that the first parameter meets a first condition.
5. The method of claim 4, adjacent two read units comprising a first read unit and a second read unit; the first reading unit corresponds to the data of the M+1 to N+1 bit addresses in the first storage space;
The determining that the first data stored in the first storage space is error-corrected in the reading process includes:
And if the reading time of the second reading unit is longer than that of the first reading unit, determining that the data of the (N+1) th bit address in the first storage space is subjected to error correction.
6. A method according to claim 3, further comprising:
Writing first identification data and second identification data into the first storage space respectively;
Responding to the first identification data read from the first storage space in a traversing way under a first mechanism, and determining a first set of data bits subjected to error correction according to the reading time of each reading unit;
Responding to the first identification data read from the first storage space in a traversing way under the first mechanism, and determining the data bits of the second set, which are subjected to error correction, according to the reading time of each reading unit;
When the first storage space is read, if the first set of data bits and/or the second set of data bits are read, determining that the data stored in the reading process is subjected to error correction.
7. The method of any one of claims 1 to 5, further comprising:
Determining a second parameter in response to reading the second data from the second storage space under the second mechanism; the second data includes a check bit, the second parameter characterizing a check result for the second data;
determining a faulty unit based on the second parameter; the failure unit is the second storage space or a data link for reading the second data.
8. The method of claim 7, wherein the determining a faulty unit based on the second parameter comprises:
If the second data is the data to be read, determining checked data bits in the second data based on the second parameter; the data to be read is read according to a reading unit;
if different checked data bits are transmitted through the same data link, determining the data link as a fault unit;
and if different checked data bits are transmitted through different data links, determining the second storage space as a fault unit.
9. A data processing apparatus comprising:
A reading module for determining a first parameter in response to reading the first data from the first storage space under a first mechanism; the first data does not include a check bit, and the first parameter characterizes a time when the first data is read;
And the determining module is used for judging whether the first parameter meets a first condition, and if so, determining that the first data stored in the first storage space in the reading process is subjected to error correction.
10. An electronic device, comprising:
One or more processors;
a memory for storing one or more programs,
Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.
CN202410137795.4A 2024-01-31 2024-01-31 Data processing method, device, electronic equipment and storage medium Pending CN117992273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410137795.4A CN117992273A (en) 2024-01-31 2024-01-31 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410137795.4A CN117992273A (en) 2024-01-31 2024-01-31 Data processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117992273A true CN117992273A (en) 2024-05-07

Family

ID=90888334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410137795.4A Pending CN117992273A (en) 2024-01-31 2024-01-31 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117992273A (en)

Similar Documents

Publication Publication Date Title
US10824499B2 (en) Memory system architectures using a separate system control path or channel for processing error information
CN107430538B (en) Dynamic application of ECC based on error type
CN105589762B (en) Memory device, memory module and method for error correction
US7945815B2 (en) System and method for managing memory errors in an information handling system
US7971112B2 (en) Memory diagnosis method
US9880896B2 (en) Error feedback and logging with memory on-chip error checking and correcting (ECC)
CN101477480B (en) Memory control method, apparatus and memory read-write system
US10606696B2 (en) Internally-generated data storage in spare memory locations
US8806285B2 (en) Dynamically allocatable memory error mitigation
KR102378466B1 (en) Memory devices and modules
CN111221775B (en) Processor, cache processing method and electronic equipment
CN112860500B (en) Power-on self-detection method for redundant aircraft management computer board
CN115408730A (en) Data processing method, chip, electronic device and storage medium
CN114996065A (en) Memory fault prediction method, device and equipment
CN115831213A (en) Detection method and device for checking processor, electronic equipment and storage medium
CN113568777B (en) Fault processing method, device, network chip, equipment and storage medium
CN105824719A (en) Method and system for detecting random access memory
US11080124B2 (en) System and method for targeted efficient logging of memory failures
KR102334739B1 (en) Memory module, system, and error correction method thereof
US10579470B1 (en) Address failure detection for memory devices having inline storage configurations
CN117992273A (en) Data processing method, device, electronic equipment and storage medium
CN115904230A (en) Data verification method, system and device
US11249839B1 (en) Method and apparatus for memory error detection
US20210311833A1 (en) Targeted repair of hardware components in a computing device
CN109710445B (en) Memory correction method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination