CN111552590B - Detection and recovery method and system for memory bit overturning of power secondary equipment - Google Patents

Detection and recovery method and system for memory bit overturning of power secondary equipment Download PDF

Info

Publication number
CN111552590B
CN111552590B CN202010299597.XA CN202010299597A CN111552590B CN 111552590 B CN111552590 B CN 111552590B CN 202010299597 A CN202010299597 A CN 202010299597A CN 111552590 B CN111552590 B CN 111552590B
Authority
CN
China
Prior art keywords
code
application program
check code
ecc
crc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010299597.XA
Other languages
Chinese (zh)
Other versions
CN111552590A (en
Inventor
周华良
刘拯
郑玉平
李友军
张吉
邹志扬
张连生
张成彬
朱彬彬
戴欣欣
郑奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Original Assignee
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, NARI Nanjing Control System Co Ltd filed Critical Nari Technology Co Ltd
Priority to CN202010299597.XA priority Critical patent/CN111552590B/en
Publication of CN111552590A publication Critical patent/CN111552590A/en
Priority to PCT/CN2020/114368 priority patent/WO2021208341A1/en
Application granted granted Critical
Publication of CN111552590B publication Critical patent/CN111552590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Detection And Correction Of Errors (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The invention discloses a method and a system for detecting and recovering memory bit overturn of electric power secondary equipment in the technical field of memory error correction, and aims to solve the technical problem that the function or result of the electric power secondary equipment is abnormal due to the memory bit overturn in the prior art. The method comprises the following steps: performing ECC check code calculation on an application program running area when the application program is loaded according to the preset ECC segment length to obtain an ECC check code of segment data of the application program when the application program is loaded; performing ECC check code calculation on the section data of the application program during the operation of the application program according to the preset ECC section length to obtain an ECC check code of the section data of the application program during the operation of the application program; comparing the ECC check code with the ECC check code; and if the single bit error of the segment data of the application program is judged to occur during the operation of the application program according to the comparison result, correcting the bit with the error.

Description

Detection and recovery method and system for memory bit overturning of power secondary equipment
Technical Field
The invention relates to a detection and recovery method and a detection and recovery system for memory bit overturn of power secondary equipment, and belongs to the technical field of memory error correction.
Background
Most of modern secondary devices of power systems are embedded devices, which are composed of a large number of chips, and one of the core components of the secondary devices has millions of memory units, each of which can store information such as "0" or "1". Since the internal memory is used for temporarily storing the executing program and data, once a data error occurs inside the internal memory, the normal operation of the program will be affected, and if the internal memory is serious, the whole system may fail, so that the reliability and fault tolerance of the internal memory are always hot issues of research in the industry.
Years of research in semiconductor and memory technologies have found that the causes of memory errors are mainly classified into hard errors and soft errors. Hard errors can occur repeatedly, mainly caused by damage to hardware of memory units in the memory chips or external connection errors, and can be generally solved only by replacing hardware. Soft errors occur randomly and can be recovered after the program is reloaded. A soft error is usually an unexpected flip (by 0- >1 or by 1- > 0) of 1-bit data, and the probability of occurrence is high.
Soft errors are typically caused by single event effects. Single Event Effects (ses) refer to a phenomenon in which radiation particles (heavy particles, protons, neutrons, X-rays, gamma rays, alpha particles, and the like) originating from cosmic radiation and ground radiation environments cause serious damage to integrated circuits and even electronic devices. When the radiation particles collide with the silicon material, additional radiation-induced electron-hole pairs are generated due to direct or indirect ionization, and the electron-hole pairs can be separated by an electric field in the reverse bias depletion layer of the device and effectively collected through a drift process, so that additional charge accumulation exists in a sensitive region of the device. When the accumulated charge is sufficient, a large voltage transient pulse is generated that can temporarily flip the voltage at the sensitive node of the circuit. In combinational circuits, such voltage transient pulses are called Single Event Transients (SETs). SETs can propagate along a circuit to Memory cells, such as Static Random Access Memory (SRAM) cells. SETs may cause memory cells to capture erroneous timing information and cause bit changes if appropriate conditions are met.
An Error Correction Code (ECC) detection technology can be used for solving the problem of soft errors caused by 1-bit flipping of a memory and a Nand Flash device. The technology appears after the 'parity check' technology, is a more advanced storage error checking and correcting means, and is widely applied to workstations and server products. The ECC technology is that a code used for data encryption is additionally stored on a data bit, and when data is written into a memory, a corresponding ECC code is also stored; when the data just stored is read back again, the stored ECC code is compared with the ECC code calculated in real time when the data was read, and if the two codes are not the same, they are decoded to determine which bit in the data is incorrect. This erroneous bit is then discarded and the correct data is released by the memory controller, and if the same erroneous data is read out again, the correction process is performed again.
In recent years, hardware ECC error detection and correction techniques, such as a multi-core ARM processor AM572x of the Ti company, a DSP chip C665X of the Ti company, and a fully programmable SOC chip UltraScale MPSoC recently introduced by the Xilinx company, are added to a part of embedded processors and digital signal processor chips (DSPs) in succession. However, most bit flipping detection recovery schemes require support of hardware units, and achieve bit flipping error protection from a hardware perspective. Because the hardware ECC error detection and correction needs to integrate a hardware error detection and correction control module in a memory controller of a processor, and further chip cost is greatly increased, the hardware ECC error detection and correction is widely applied to processors and RAM devices of electric secondary equipment at present, most of the hardware ECC error detection and correction cannot support the hardware ECC detection and recovery function, and the problem that the protection bit cannot be turned over from the hardware layer is solved.
Disclosure of Invention
In view of the defects in the prior art, an object of the present invention is to provide a method and a system for detecting and recovering memory bit flipping of a power secondary device, so as to solve the technical problem in the prior art that a function or a result of a device is abnormal due to the memory bit flipping of the power secondary device.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a detection and recovery method for memory bit upset of power secondary equipment comprises the following steps:
performing ECC check code calculation on an application program running area when the application program is loaded according to the preset ECC segment length to obtain an ECC check code of segment data of the application program when the application program is loaded;
performing ECC check code calculation on the section data of the application program during the running of the application program according to the preset ECC section length to obtain an ECC check code of the section data of the application program during the running of the application program;
comparing the ECC check code with the ECC check code;
and if the single bit error of the segment data of the application program is judged to occur during the operation of the application program according to the comparison result, correcting the bit with the error.
Further, after the ECC check code of the section data of the application program when the application program is loaded is obtained, the method further includes: storing the obtained ECC check codes, wherein the storage quantity is not less than three;
comparing the ECC check code with the ECC check code, comprising:
comparing the stored ECC check codes one by one;
and if at least two comparison results in the stored ECC check codes are consistent, extracting the ECC check codes with consistent comparison results and comparing the ECC check codes with the ECC check codes.
Further, comparing the ECC check code with the ECC check code further includes: and extracting the ECC check codes with the consistent comparison result, and replacing the ECC check codes with the inconsistent comparison result.
Further, comparing the ECC check code with the ECC check code further includes:
and if the comparison results in the stored ECC check codes are inconsistent, adopting a redundancy backup recovery mechanism to recover the code segment of the application program when the application program runs.
Further, if the number of bits with errors in the section data of the application program during the operation of the application program is judged to be not less than two according to the comparison result, a redundancy backup recovery mechanism is adopted to recover the code sections of the application program during the operation of the application program.
Further, the method for restoring the code segment of the application program during the running of the application program by adopting the redundancy backup restoration mechanism comprises the following steps:
acquiring a code segment of an application program when the application program is loaded, and performing CRC (cyclic redundancy check) code calculation on the code segment to acquire a CRC code of the code segment;
performing CRC check code calculation on a code segment of the application program when the application program runs to acquire a CRC check code of the code segment;
comparing the CRC check code with the CRC check code;
and if the comparison result of the CRC check code is inconsistent with the comparison result of the CRC check code, extracting a code segment corresponding to the CRC check code, and covering the code segment of the application program when the application program runs.
Further, after acquiring the code segment of the application program when the application program is loaded, the method further comprises the following steps: compressing the acquired code segment;
before extracting the code segment corresponding to the CRC check code, the method further includes: and decompressing the compressed code section corresponding to the CRC check code.
Further, after obtaining the code segment of the application program and the CRC check code thereof when the application program is loaded, the method further includes: storing the obtained code segments and the CRC codes thereof, wherein the storage quantity is not less than two;
extracting a code segment corresponding to the CRC code, comprising:
performing online check on the stored CRC check code;
and if the online check results of the stored CRC check codes are consistent all the time, extracting the code segment corresponding to the CRC check code with the consistent online check result.
Further, extracting a code segment corresponding to the CRC check code further includes: and extracting the CRC check codes and the code segments thereof with the consistent online check results, and replacing the CRC check codes and the code segments thereof with inconsistent online check results.
Further, the online checking of the stored CRC check code includes: and comparing the current state with the last state of the stored CRC code according to a preset time interval.
Further, the method also comprises the following steps: detecting key data pre-registered in a recovery application program;
the method for detecting and recovering the key data comprises the following steps:
extracting the memory address of the key data based on the pre-registration information of the key data;
acquiring memory data in the memory address when an application program is loaded, and performing CRC (cyclic redundancy check) code calculation on the memory data to acquire a CRC code;
performing CRC (cyclic redundancy check) code calculation on memory data in the memory address of the key data during the operation of the application program to acquire a CRC check code;
comparing the CRC check code with the CRC check code;
and if the comparison result of the CRC check code is inconsistent with the comparison result of the CRC check code, extracting the memory data corresponding to the CRC check code, and covering the memory data in the memory address when the application program runs.
Further, after obtaining the memory data in the memory address and the CRC check code thereof when the application program is loaded, the method further includes: storing the acquired memory data and the CRC codes thereof, wherein the storage quantity is not less than two;
extracting memory data corresponding to the CRC code, including:
performing online check on the stored CRC check code;
and if the online check results of the stored CRC check codes are consistent all the time, extracting the memory data corresponding to the CRC check codes with consistent online check results.
Further, extracting memory data corresponding to the CRC check code, further includes: and extracting the CRC check codes and the memory data thereof with the consistent online check results, and replacing the CRC check codes and the memory data thereof with inconsistent online check results.
In order to achieve the above object, the present invention further provides a system for detecting and recovering memory bit flipping of a power secondary device, including: a single bit error recovery module, the single bit error recovery module comprising:
an ECC check code calculation submodule: the ECC check code calculation method comprises the steps of performing ECC check code calculation on an application program running area when an application program is loaded according to the preset ECC section length to obtain the section data of the application program when the application program is loaded, and performing ECC check code calculation on the section data of the application program when the application program runs according to the preset ECC section length to obtain the ECC check code of the section data of the application program when the application program runs;
an ECC check code storage submodule: after the ECC check code of the section data of the application program is obtained when the application program is loaded, the ECC check code is used for storing the obtained ECC check code, and the storage quantity is not less than three;
the ECC check code is compared with the submodule: the ECC checking device is used for comparing the stored ECC checking codes one by one, and if at least two comparison results in the stored ECC checking codes are consistent, the ECC checking codes with consistent comparison results are extracted and compared with the ECC checking codes;
ECC check code replacement submodule: the ECC check codes with consistent comparison results are extracted, and the ECC check codes with inconsistent results are replaced by comparison;
single bit error correction submodule: and if the single bit error of the segment data of the application program is judged to occur during the operation of the application program according to the comparison result, the single bit error is used for correcting the bit with the error.
Further, the system comprises a redundant backup and recovery module for recovering the code segment of the application program in the operation process of the application program by adopting a redundant backup and recovery mechanism, wherein the redundant backup and recovery module comprises:
a code segment acquisition submodule: a code segment for acquiring the application program when the application program is loaded;
CRC check code calculation submodule: the CRC check code calculation method comprises the steps of performing CRC check code calculation on a code segment of an application program when the application program is loaded to acquire the CRC check code of the code segment, and performing CRC check code calculation on the code segment of the application program when the application program runs to acquire the CRC check code of the code segment;
CRC check code storage submodule: after the code segment of the application program and the CRC check code thereof are obtained when the application program is loaded, the code segment and the CRC check code thereof are used for storing the obtained code segment and the CRC check code thereof, and the storage quantity is not less than two;
a CRC check code online check submodule: the CRC check code is used for carrying out online check on the stored CRC check code;
the CRC check code compares to the submodule: the CRC check code is used for comparing the CRC check code with the CRC check code;
the code segment covers the submodule: if the online check results of the stored CRC check codes are consistent all the time, the method is used for extracting the code sections corresponding to the CRC check codes with consistent online check results all the time and covering the code sections of the application program when the application program runs;
CRC check code replacement submodule: the CRC check code and the code segment thereof used for extracting the online check results which are consistent all the time are replaced;
the code segment compression submodule: the method is used for compressing the acquired code segment after acquiring the code segment of the application program when the application program is loaded, and decompressing the compressed code segment corresponding to the CRC check code before extracting the code segment corresponding to the CRC check code.
Further, the system also comprises a key data detection recovery module, wherein the key data detection recovery module comprises:
a memory address extraction submodule: the memory address is used for extracting the key data based on the pre-registration information of the key data;
a memory data acquisition submodule: the memory access control device is used for acquiring memory data in the memory address when the application program is loaded;
the CRC check code calculation submodule is also used for performing CRC check code calculation on the memory data in the memory address when the application program is loaded so as to obtain a CRC check code, and performing CRC check code calculation on the memory data in the memory address of the key data when the application program runs so as to obtain a CRC check code;
the CRC check code storage submodule is also used for storing the acquired memory data and the CRC check codes thereof after acquiring the memory data and the CRC check codes thereof in the memory address when the application program is loaded, and the storage quantity is not less than two;
the code segment covering sub-module is also used for extracting the memory data corresponding to the CRC check codes with consistent online check results and covering the memory data in the memory address when the application program runs if the online check results of the stored CRC check codes are consistent all the time;
the CRC check code replacing submodule is also used for extracting the CRC check codes and the memory data thereof with the consistent online check results, and replacing the CRC check codes and the memory data thereof with the inconsistent online check results.
In order to achieve the above object, the present invention also provides a computer processing control apparatus, comprising:
a memory: for storing instructions;
a processor: the method is used for operating according to the instruction to execute the steps of the detection and recovery method for the memory bit overturn of the power secondary equipment provided by the invention.
To achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for detecting and recovering from memory bit flipping in a power secondary device provided by the present invention.
Compared with the prior art, the invention has the following beneficial effects: the method and the system of the invention comprehensively use two mechanisms of ECC error detection and correction and redundancy backup recovery to respectively realize the quick positioning recovery of single-bit errors and the recovery of multi-bit errors, and give consideration to the error correction efficiency and the error correction capability, make up the defect that more than two errors can not be corrected by using ECC hardware error detection and correction alone, and are beneficial to ensuring the reliable operation of the power secondary equipment. Because a recovery circuit does not need to be detected in the processor, the method is not influenced by the process design change of the processor, achieves the aims of continuing using the existing architecture and saving the development cost, can be popularized to processors with different architectures, and has wide applicability.
Drawings
FIG. 1 is a schematic diagram illustrating address allocation of on-chip memory of a microcontroller according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a process for backing up code segments of an application in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating a process for backing up critical data in an embodiment of the system;
FIG. 4 is a diagram illustrating a recovery process for a code segment of an application in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating a recovery process of critical data in an embodiment of the system.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The general addresses of the program segment data are continuous and meet the operation requirement of an ECC (error correction code) error detection and correction algorithm, so that the software ECC error detection and correction algorithm can be adopted to quickly position and recover single-bit upset errors of the program segment data; meanwhile, in consideration of the possibility of multi-bit data errors of a system memory, in order to improve the error correction capability of an error detection and correction program, a redundancy backup recovery mechanism is added on the basis of ECC error detection and correction, the high efficiency of the ECC error detection and correction mechanism and the strong error correction capability of the redundancy backup recovery mechanism are combined, the advantages and the disadvantages are made up, the correctness of program segment data is further ensured, and the reliability of the system is ensured. The general address of the key data is discontinuous, the error detection and correction are difficult to apply by ECC, the data volume of the key data is not large, the requirement of most application occasions can be met by simply using a redundancy backup recovery mechanism, and therefore only one error detection and correction mechanism is adopted for restoring the key data by redundancy backup.
Based on the above technical idea, a specific embodiment of the present invention provides a method for detecting and recovering memory bit flipping of a secondary power device, including the following steps:
loading an application program by using an error detection and correction program, calculating an ECC check code for an application program operating area section by section according to the length of the ECC (the section length is confirmed according to the digit of the ECC), and storing three ECC check codes to prevent the ECC check code from generating errors in the operating process. And storing the ECC check codes of all the sections in the memory reserved area in a mode of three copies per section according to the section sequence.
And step two, when the application program is loaded, the error detection and correction program separates out the code segment of the application program according to different file types, compresses the code segment data and calculates the CRC check code to obtain the CRC check code, backups at least two parts of the compressed code segment data and the CRC check code, and stores the two parts in a system memory or an external memory, wherein the compressed data is mainly used for saving the memory overhead of redundant backup.
And step three, after the application program is started, setting parameters such as the number of sections needing ECC check and the processing mode after the unrecoverable error occurs in the single execution of the error detection and correction program to the error detection and correction program. The application program registers key data to be protected to an error detection and correction program through an interface provided by the error detection and correction program, the error detection and correction program acquires memory addresses of the key data, backups the memory data in the memory addresses, sequences the memory data according to a hash (hash) table and then stores the memory data, the sequenced whole piece of content stores more than two parts, and CRC (cyclic redundancy check) code calculation is carried out on each part of content respectively to acquire CRC codes.
And step four, after the system normally operates, the error detection and correction program regularly carries out ECC error detection and correction on the original program section data according to the section length, carries out ECC check code calculation on the program data of the section length every time to obtain an ECC check code, and carries out comprehensive judgment on the ECC check code by utilizing three pre-stored ECC check codes. Firstly, comparing three ECC check codes, and if two ECC check codes are consistent, considering the ECC check codes to be correct; and if the ECC check codes with errors exist in the three pre-stored ECC check codes, restoring the ECC check codes with errors to correct values. And then, comparing the correct ECC check code with the ECC check code, if the comparison result is consistent, judging that no bit error occurs in the data, otherwise, judging that the bit error occurs in the data. If the section data of the application program in operation is judged to have single bit errors, directly utilizing a correct ECC check code to carry out error correction recovery on the section data; in order to reduce the influence of error detection and correction on system load, the error detection and correction program only detects one section of data in each operation and detects the next section of data in the next operation.
And step five, if more than two bit flipping errors or three ECC check codes in the section data of the application program are all wrong during operation, entering a redundancy comparison recovery process. Firstly, the CRC check code backed up in the second step is checked on line, namely the current state and the last state of the CRC check code are compared according to a preset time interval, if the comparison result of the CRC check code is consistent all the time, the CRC check code is determined to be correct, otherwise, the CRC check code is determined to be wrong, and for the wrong CRC check code and the corresponding code segment data backup, the correct CRC check code and the corresponding code segment data backup can be used for replacing the wrong CRC check code and the corresponding code segment data backup in time. Then, for the bit flipping errors of more than two bits, covering the code segment of the running application program by using the code segment corresponding to the correct CRC code; and for the condition that the three ECC check codes are all wrong, performing CRC check code calculation on the code segment of the application program in operation to obtain the CRC check code of the code segment, then comparing the CRC check code with the correct CRC check code, if the comparison result is consistent, determining that no bit flipping error occurs in the application program in operation, if the comparison result is inconsistent, determining that the bit flipping error occurs in the application program in operation, and covering the code segment of the application program in operation by using the code segment corresponding to the correct CRC check code. When the code segment of the application program in operation is covered, the error detection and correction program closes the preemption function of the system program or enables the operation program to enter a dormant state by calling a system interface, then the correct backup version is decompressed and restored to the corresponding internal memory, the correct backup version, the backup check code and the ECC check code are restored and covered to the original wrong backup version, and finally the preemption function of the system program or the operation of the recovery program are enabled.
And step six, after the system normally operates, periodically and circularly checking the correctness of the registered key data by an error detection and correction program, acquiring the memory address of the key data by inquiring a hash (hash) table, comparing the content in the memory address with a backup version, namely performing CRC check code calculation on the content in the memory address to acquire a CRC check code, and comparing the CRC check code with the correct CRC check code backed up in the step three. And if the CRC check code is not consistent with the correct CRC check code, extracting the memory data corresponding to the correct CRC check code, and covering the memory data in the corresponding memory address when the application program runs. When the covering operation is carried out, the error detection and correction program closes the system program preemption function by calling the system interface or enables the running program to enter a dormant state, then the correct backup version is restored to the corresponding memory address, meanwhile, the correct backup check code is restored to the backup version with errors, and finally the system program preemption function or the restoring program is enabled to run.
In the embodiment of the method, the error detection and correction program is always operated in the system background, and the error detection and correction process of the program segment data and the key data is continuously carried out. When the error recovery operation is executed, a sandbox-like mechanism is used, so that the problem that the whole system enters an abnormal state due to the fact that the error recovery process is abnormally interrupted or abnormal codes are executed is avoided, and the correctness of an error recovery process and a result is guaranteed. After detecting and repairing the error, the error detection and correction program informs the application program of the address of the error, the type of the error and the total number of times of the error, and when the unrecoverable error occurs, the application program can process the exception according to the predefined behavior of the application program, including but not limited to reloading the application program, quitting the system, restarting the system through a watchdog, and the like.
The system, namely the error detection and correction program, completes functions of application program loading and guiding, ECC check code calculation, ECC error detection and correction process, data content backup and recovery, unrecoverable error processing and the like through the program. The number of sections of each ECC check of the error detection and correction program can be flexibly configured through the application program, so that the system can be conveniently applied to systems with different processor load grades, the error detection and correction frequency is determined according to the actual system processor load level, and the whole load of the system is in a reasonable level.
The system of the invention is based on the operating-free system environment of the microcontroller, because of no intervention of a Cache memory (Cache) and a Memory Management Unit (MMU), the logical address is consistent with the physical address, the content obtained by the processor accessing a certain address is the real version inside the physical memory instead of the temporary version stored in the Cache, and the method of the invention is easier to realize. In this embodiment, taking the no-operating-system environment based on the microcontroller as an example, the method of the present invention is implemented in the on-chip memory of the microcontroller, so as to implement the functions of on-line detection and recovery of errors of the on-chip memory of the microcontroller. Other operating environments may also employ the associated methods of the present invention with the addition of some additional operations.
Fig. 1 is a schematic diagram illustrating address allocation of an on-chip memory of a microcontroller according to an embodiment of the present invention, on which three programs are executed: the BOOT starting program is used for initializing the running environment of the microcontroller and the peripheral equipment of the processor, so that the whole system enters an initial running state; the error detection and correction program is used for guiding the application program, carrying out ECC (error correction code) error detection and correction on the application code, and backing up and recovering the application code and the key data; the application program is used for realizing the specific functions of the system. During program design, the memory between the low address (Resv _ low) of the memory reserved area and the high address (Resv _ high) of the memory reserved area is reserved for an error detection and correction program to be used independently, and the error detection and correction program is used for storing segmented ECC check codes of application codes, backup of the application codes and backup of key data. Other programs cannot use the memory space of the memory reserved area, and the purpose can be achieved by a method of specifying the running address space of the program to avoid the memory reserved area in the program compiling and linking stage.
In this embodiment, the segmented ECC check code of the code segment is stored in three parts, which are respectively ECC code A, ECC code B, ECC code C; at least two redundant backups of the code segment and the key data are respectively backup A and backup B.
The parameter setting, ECC check code calculation and program segment backup process of the application program are specifically shown in fig. 2. And the error detection and correction program calculates an ECC check code for the application program code section by section according to the length of the ECC section and stores the ECC check code into an application code ECC area of a memory Resv area. In order to prevent the ECC check code from changing in the running process, the ECC check code of each section of data is stored in three parts, namely an ECC code A, ECC code B, ECC code C. Then, the error detection and correction program separates out the code segment according to the file type of the application program, compresses the data of the code segment and calculates the check code, stores the compressed data and the check code into the application code backup A and the application code backup B in the Resv area of the memory, and then guides the application program to run.
The key parameter setting and the backup process of the application key data are specifically shown in fig. 3. After the application program runs, parameters such as the number of sections which need ECC check by the error detection and correction program at one time, the processing mode after the unrecoverable error occurs and the like are set to the error detection and correction program. And then the application program registers key data needing protection to the error detection and correction program through an interface provided by the error detection and correction program, the error detection and correction program acquires memory addresses of all the key data, backups the contents in the memory addresses, serializes the memory addresses according to a hash (hash) table and stores the serialized contents into a key data backup area, the serialized whole contents are sequentially stored into a data backup A and a data backup B of the key data backup area, and each backup data independently calculates a check code for verifying the correctness of the backup data and attaches the check code to the tail part of the backup data.
The process of error detection and recovery of the program segment of the application program is specifically shown in fig. 4. After the system normally operates, the error detection and correction program regularly carries out ECC error detection and correction on the original program section data according to the section length, and the section length is configured to the error detection and correction program by the application program in the previous step. The ECC check code is calculated for the segment length program data each time, and comprehensive evaluation is carried out together with the prestored ECC code A, ECC code B, ECC code C. If the original data has single bit error, directly carrying out error correction and recovery; meanwhile, if the error occurs in the pre-stored ECC check code, the check code is restored to a correct value. In order to reduce the influence of error detection and correction on system load, the error detection and correction program only detects one section of data in each operation and detects the next section of data in the next operation. When a multi-bit error which cannot be recovered by ECC or all ECC check codes are detected to be wrong, a redundancy contrast recovery flow is entered. When all ECC check codes are detected to be wrong, verifying whether a currently running program is correct or not, if not, respectively verifying whether a backup A and a backup B are correct or not, if the data of the backup A is correct, restoring the backup A to a running code segment, and during restoration, closing a system preemption function to prevent the problem of program running caused by preemption of an application program while restoring the code segment; if the backup A is incorrect and the backup B is correct, the backup B is restored to the running code segment, and meanwhile the backup B is required to be restored to the backup A so as to ensure that the subsequent verification can be correctly continued. If the backup A and the backup B are not correct, it indicates that a very serious error is generated at the moment, and the error cannot be recovered, and at the moment, the error detection and correction program can process the system according to a processing mode preset by the application program, such as restarting the system, quitting the application program from running, and the like. The error detection and correction program will also record the detected error in its own dedicated variable for review by the application. When a multi-bit error which cannot be recovered by ECC is detected, only the backup A and the backup B need to be checked to be correct, whether a currently running program is correct does not need to be verified, and the subsequent process is consistent with the processing process when all ECC check codes are wrong.
The critical data error detection, recovery process is applied as shown in fig. 5. After the system normally operates, the error detection and correction program periodically and circularly checks the correctness of the registered key data, acquires the memory address of the data by inquiring a hash (hash) table, compares the key data with the backup A, and directly returns to the next operation if the key data is consistent with the backup A, so that no error occurs. When the key data is inconsistent with the backup A, the error is detected and corrected, and the error is detected and corrected after the backup B is restored to the backup A. If the key data is different from the backup A and the backup B, respectively checking whether the backup A and the backup B are correct, and if the backup A is correct, restoring the backup A to the key data; if the backup A is incorrect and the backup B is correct, the backup B is restored to the key data, and meanwhile, the backup B is required to be restored to the backup A so as to ensure that the subsequent verification can be correctly continued. The system preemption function needs to be closed during recovery to prevent the program operation from being problematic because the code segment is preempted by the application program while being recovered. If the backup A and the backup B are not correct, it indicates that a very serious error is generated at the moment, and the error cannot be recovered, and at the moment, the error detection and correction program can process the system according to a processing mode preset by the application program, such as restarting the system, quitting the application program from running, and the like. The error detection and correction program will also record the detected error in its own dedicated variable for review by the application.
The system of the invention is distinguished based on the functional module, which mainly comprises:
(1) a single bit error recovery module:
an ECC check code calculation submodule: the ECC check code calculation method comprises the steps of performing ECC check code calculation on an application program running area when an application program is loaded according to the preset ECC section length to obtain the section data of the application program when the application program is loaded, and performing ECC check code calculation on the section data of the application program when the application program runs according to the preset ECC section length to obtain the ECC check code of the section data of the application program when the application program runs;
an ECC check code storage submodule: after the ECC check code of the section data of the application program is obtained when the application program is loaded, the ECC check code is used for storing the obtained ECC check code, and the storage quantity is not less than three;
the ECC check code is compared with the submodule: the ECC check code comparison module is used for comparing the stored ECC check codes one by one, and if at least two comparison results in the stored ECC check codes are consistent, extracting the ECC check codes with consistent comparison results and comparing the ECC check codes with consistent comparison results;
the ECC check code replacement sub-module: the ECC check codes with consistent comparison results are extracted, and the ECC check codes with inconsistent results are replaced by comparison;
single bit error correction submodule: and if the single bit error of the segment data of the application program is judged to occur during the operation of the application program according to the comparison result, the single bit error is used for correcting the bit with the error.
(2) Redundant backup recovery module
A code segment acquisition submodule: a code segment for acquiring the application program when the application program is loaded;
CRC check code calculation submodule: the CRC check code calculation module is used for calculating the CRC check code of the code segment of the application program when the application program is loaded so as to acquire the CRC check code of the code segment, and calculating the CRC check code of the code segment of the application program when the application program runs so as to acquire the CRC check code of the code segment;
CRC check code storage submodule: after the code segment and the CRC code of the application program are obtained when the application program is loaded, the method is used for storing the obtained code segment and the CRC code of the code segment, and the storage quantity is not less than two;
a CRC check code online check submodule: the CRC check code is used for carrying out online check on the stored CRC check code;
the CRC check code compares to the submodule: the CRC check code is used for comparing the CRC check code with the CRC check code;
the code segment overlay sub-module: if the online check results of the stored CRC check codes are consistent all the time, the method is used for extracting the code segment corresponding to the CRC check code with the consistent online check result and covering the code segment of the application program when the application program runs;
CRC check code replacement submodule: the CRC check code and the code segment thereof used for extracting the online check results which are consistent all the time are replaced;
a code segment compression submodule: the method is used for compressing the acquired code segment after acquiring the code segment of the application program when the application program is loaded, and decompressing the compressed code segment corresponding to the CRC check code before extracting the code segment corresponding to the CRC check code.
(3) Key data detection and recovery module
A memory address extraction submodule: the memory address is used for extracting the key data based on the pre-registration information of the key data;
a memory data acquisition submodule: the memory access control device is used for acquiring memory data in the memory address when the application program is loaded;
the CRC calculation submodule is further used for performing CRC calculation on the memory data in the memory address when the application program is loaded so as to obtain a CRC, and performing CRC calculation on the memory data in the memory address of the key data when the application program runs so as to obtain a CRC;
the CRC check code storage submodule is also used for storing the acquired memory data and the CRC check codes thereof after acquiring the memory data and the CRC check codes thereof in the memory address when the application program is loaded, and the storage quantity is not less than two;
the code segment covering submodule is also used for extracting the memory data corresponding to the CRC check code with the consistent online check result if the online check result of the stored CRC check code is consistent all the time, and covering the memory data in the memory address when the application program runs;
the CRC check code replacing submodule is also used for extracting the CRC check codes and the memory data thereof with the consistent online check results and replacing the CRC check codes and the memory data thereof with the inconsistent online check results.
The embodiment of the present invention also provides a computer processing control apparatus, including:
a memory: for storing instructions;
a processor: the method is used for operating according to the instruction to execute the steps of the detection and recovery method for the memory bit overturn of the power secondary equipment provided by the invention.
The embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for detecting and recovering memory bit flipping in a power secondary device provided by the present invention.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (15)

1. A detection and recovery method for memory bit overturn of power secondary equipment is characterized by comprising the following steps:
performing ECC check code calculation on an application program running area when the application program is loaded according to the preset ECC segment length to obtain an ECC check code of segment data of the application program when the application program is loaded;
performing ECC check code calculation on the section data of the application program during the operation of the application program according to the preset ECC section length to obtain an ECC check code of the section data of the application program during the operation of the application program;
comparing the ECC check code with the ECC check code;
if the single bit error of the segment data of the application program occurs during the operation of the application program is judged according to the comparison result, correcting the bit with the error;
if the number of the error bits in the section data of the application program during the operation of the application program is judged to be not less than two according to the comparison result, a redundancy backup recovery mechanism is adopted to recover the section data of the application program during the operation of the application program;
the code segment for recovering the application program in the operation process of the application program by adopting the redundancy backup recovery mechanism comprises the following steps:
acquiring a code segment of an application program when the application program is loaded, and performing CRC (cyclic redundancy check) code calculation on the code segment to acquire a CRC code of the code segment;
performing CRC check code calculation on a code segment of the application program when the application program runs to acquire a CRC check code of the code segment;
comparing the CRC check code with the CRC check code;
and if the comparison result of the CRC check code is inconsistent with the comparison result of the CRC check code, extracting a code segment corresponding to the CRC check code, and covering the code segment of the application program when the application program runs.
2. The method for detecting and recovering from a memory bit reversal in power secondary equipment according to claim 1, further comprising, after acquiring an ECC check code of section data of an application program at the time of application program loading: storing the obtained ECC check codes, wherein the storage quantity is not less than three;
comparing the ECC check code with the ECC check code, comprising:
comparing the stored ECC check codes one by one;
if at least two comparison results in the stored ECC check codes are consistent, the ECC check codes with the consistent comparison results are extracted and compared with the ECC check codes.
3. The method for detecting and recovering memory bit flipping of power secondary equipment according to claim 2, wherein comparing the ECC check code with the ECC check code further comprises: and extracting the ECC check codes with the consistent comparison result, and replacing the ECC check codes with the inconsistent comparison result.
4. The method for detecting and recovering memory bit flipping of power secondary equipment according to claim 2, wherein comparing the ECC check code with the ECC check code further comprises:
and if the comparison results in the stored ECC check codes are inconsistent, restoring the code segment of the application program when the application program runs by adopting a redundancy backup and recovery mechanism.
5. The method for detecting and recovering memory bit flipping of an electric power secondary device according to claim 1, after acquiring a code segment of an application program when the application program is loaded, further comprising: compressing the acquired code segment;
before extracting the code segment corresponding to the CRC check code, the method further includes: and decompressing the compressed code section corresponding to the CRC check code.
6. The method for detecting and recovering from memory bit flipping in power secondary equipment as claimed in claim 1, wherein after acquiring the code segment of the application program and its CRC check code when the application program is loaded, the method further comprises: storing the obtained code segments and the CRC codes thereof, wherein the storage quantity is not less than two;
extracting a code segment corresponding to the CRC check code, comprising:
performing online check on the stored CRC check code;
and if the online check results of the stored CRC check codes are consistent all the time, extracting the code segment corresponding to the CRC check code with the consistent online check result.
7. The method for detecting and recovering from memory bit flipping in power secondary equipment as claimed in claim 6, wherein extracting a code segment corresponding to a CRC check code further comprises: and extracting the CRC check codes and the code sections thereof with the consistent online check results, and replacing the CRC check codes and the code sections thereof with inconsistent online check results.
8. The method for detecting and recovering memory bit flipping of an electric power secondary device according to claim 6, wherein performing online check on the stored CRC check code comprises: and comparing the current state with the last state of the stored CRC code according to a preset time interval.
9. The method for detecting and recovering from a memory bit flip in a power secondary device according to claim 1, further comprising: detecting key data pre-registered in a recovery application program;
the method for detecting and recovering the key data comprises the following steps:
extracting the memory address of the key data based on the pre-registration information of the key data;
acquiring memory data in the memory address when an application program is loaded, and performing CRC (cyclic redundancy check) code calculation on the memory data to acquire a CRC code;
performing CRC (cyclic redundancy check) code calculation on memory data in the memory address of the key data during the operation of the application program to acquire a CRC check code;
comparing the CRC check code with the CRC check code;
and if the comparison result of the CRC check code is inconsistent with the comparison result of the CRC check code, extracting the memory data corresponding to the CRC check code, and covering the memory data in the memory address when the application program runs.
10. The method for detecting and recovering memory bit flipping of an electric power secondary device according to claim 9, after acquiring the memory data and the CRC check code thereof in the memory address when the application program is loaded, further comprising: storing the acquired memory data and the CRC codes thereof, wherein the storage quantity is not less than two;
extracting memory data corresponding to the CRC code, including:
performing online check on the stored CRC check code;
and if the online check results of the stored CRC check codes are consistent all the time, extracting the memory data corresponding to the CRC check codes with consistent online check results.
11. The method for detecting and recovering from memory bit flipping in power secondary equipment according to claim 10, wherein extracting memory data corresponding to a CRC check code further comprises: and extracting the CRC check codes and the memory data thereof with the consistent online check results, and replacing the CRC check codes and the memory data thereof with inconsistent online check results.
12. The utility model provides a detection recovery system of upset of electric power secondary equipment RAM bit which characterized by includes: a single bit error recovery module, the single bit error recovery module comprising:
an ECC check code calculation submodule: the ECC check code calculation method comprises the steps of performing ECC check code calculation on an application program running area when an application program is loaded according to the preset ECC section length to obtain the section data of the application program when the application program is loaded, and performing ECC check code calculation on the section data of the application program when the application program runs according to the preset ECC section length to obtain the ECC check code of the section data of the application program when the application program runs;
an ECC check code storage submodule: after the ECC check code of the section data of the application program is obtained when the application program is loaded, the ECC check code is used for storing the obtained ECC check code, and the storage quantity is not less than three;
the ECC check code is compared with the submodule: the ECC check code comparison module is used for comparing the stored ECC check codes one by one, and if at least two comparison results in the stored ECC check codes are consistent, extracting the ECC check codes with consistent comparison results and comparing the ECC check codes with consistent comparison results;
ECC check code replacement submodule: the ECC check codes with the same comparison result are extracted, and the ECC check codes with different results are replaced by comparison;
single bit error correction submodule: if the single bit error of the segment data of the application program is judged to occur during the operation of the application program according to the comparison result, the single bit error is used for correcting the bit with the error;
the system also comprises a redundancy backup recovery module for recovering the code segment of the application program in operation by adopting a redundancy backup recovery mechanism, wherein the redundancy backup recovery module comprises:
a code segment acquisition submodule: a code segment for acquiring the application program when the application program is loaded;
CRC check code calculation submodule: the CRC check code calculation method comprises the steps of performing CRC check code calculation on a code segment of an application program when the application program is loaded to acquire the CRC check code of the code segment, and performing CRC check code calculation on the code segment of the application program when the application program runs to acquire the CRC check code of the code segment;
CRC check code storage submodule: after the code segment of the application program and the CRC check code thereof are obtained when the application program is loaded, the code segment and the CRC check code thereof are used for storing the obtained code segment and the CRC check code thereof, and the storage quantity is not less than two;
a CRC check code online check submodule: the CRC check code is used for carrying out online check on the stored CRC check code;
the CRC check code compares to the submodule: the CRC check code is used for comparing the CRC check code with the CRC check code;
the code segment covers the submodule: if the online check results of the stored CRC check codes are consistent all the time, the method is used for extracting the code sections corresponding to the CRC check codes with consistent online check results all the time and covering the code sections of the application program when the application program runs;
CRC check code replacement submodule: the CRC check code and the code segment thereof used for extracting the online check results which are consistent all the time are replaced;
the code segment compression submodule: the CRC code extraction module is used for compressing the acquired code segment after acquiring the code segment of the application program when the application program is loaded, and decompressing the compressed code segment corresponding to the CRC code before extracting the code segment corresponding to the CRC code.
13. The system for detecting and recovering from memory bit flipping in power secondary equipment according to claim 12, further comprising a critical data detection and recovery module, wherein the critical data detection and recovery module comprises:
the memory address extraction submodule comprises: the memory address is used for extracting the key data based on the pre-registration information of the key data;
a memory data acquisition submodule: the memory access control device is used for acquiring memory data in the memory address when the application program is loaded;
the CRC check code calculation submodule is also used for performing CRC check code calculation on the memory data in the memory address when the application program is loaded so as to obtain a CRC check code, and performing CRC check code calculation on the memory data in the memory address of the key data when the application program runs so as to obtain a CRC check code;
the CRC check code storage submodule is also used for storing the acquired memory data and the CRC check codes thereof after acquiring the memory data and the CRC check codes thereof in the memory address when the application program is loaded, and the storage quantity is not less than two;
the code segment covering sub-module is also used for extracting the memory data corresponding to the CRC check codes with consistent online check results and covering the memory data in the memory address when the application program runs if the online check results of the stored CRC check codes are consistent all the time;
the CRC check code replacing submodule is also used for extracting the CRC check codes and the memory data thereof with the consistent online check results, and replacing the CRC check codes and the memory data thereof with the inconsistent online check results.
14. Computer processing control device, characterized by, includes:
a memory: for storing instructions;
a processor: for operating in accordance with the instructions to perform the steps of the method of any one of claims 1 to 11.
15. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.
CN202010299597.XA 2020-04-16 2020-04-16 Detection and recovery method and system for memory bit overturning of power secondary equipment Active CN111552590B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010299597.XA CN111552590B (en) 2020-04-16 2020-04-16 Detection and recovery method and system for memory bit overturning of power secondary equipment
PCT/CN2020/114368 WO2021208341A1 (en) 2020-04-16 2020-09-10 Method and system for detecting and recovering memory bit flipping in secondary power equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010299597.XA CN111552590B (en) 2020-04-16 2020-04-16 Detection and recovery method and system for memory bit overturning of power secondary equipment

Publications (2)

Publication Number Publication Date
CN111552590A CN111552590A (en) 2020-08-18
CN111552590B true CN111552590B (en) 2022-09-30

Family

ID=72007435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010299597.XA Active CN111552590B (en) 2020-04-16 2020-04-16 Detection and recovery method and system for memory bit overturning of power secondary equipment

Country Status (2)

Country Link
CN (1) CN111552590B (en)
WO (1) WO2021208341A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552590B (en) * 2020-04-16 2022-09-30 国电南瑞科技股份有限公司 Detection and recovery method and system for memory bit overturning of power secondary equipment
CN112053737B (en) * 2020-08-21 2022-08-26 国电南瑞科技股份有限公司 Online parallel processing soft error real-time error detection and recovery method and system
CN114253758A (en) * 2020-09-21 2022-03-29 华为技术有限公司 Data processing method and related device
CN114598418A (en) * 2020-12-07 2022-06-07 山东新松工业软件研究院股份有限公司 Method, device and system applied to encoder data transmission
CN112860500B (en) * 2021-02-22 2024-03-22 四川腾盾科技有限公司 Power-on self-detection method for redundant aircraft management computer board
CN114238035B (en) * 2022-02-23 2022-06-21 南京芯驰半导体科技有限公司 Method and system for error detection through running state fingerprint
CN114579352A (en) * 2022-04-29 2022-06-03 阿里云计算有限公司 Data reconstruction method and device
CN115421967B (en) * 2022-11-04 2022-12-30 中国电力科学研究院有限公司 Method and system for evaluating storage abnormal risk point of secondary equipment
CN116107800B (en) * 2023-04-12 2023-08-15 浙江恒业电子股份有限公司 Verification code generation method, data recovery method, medium and electronic equipment
CN117194110A (en) * 2023-09-20 2023-12-08 通号通信信息集团上海有限公司 Data heterogeneous storage system and abnormal data self-recovery method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232348A (en) * 2006-10-04 2008-07-30 马维尔国际贸易有限公司 Method and device for error correcting using cyclic redundancy check
CN104598342A (en) * 2014-12-31 2015-05-06 曙光信息产业(北京)有限公司 Internal storage detection method and device
CN109800104A (en) * 2018-12-18 2019-05-24 盛科网络(苏州)有限公司 Detection method, device, storage medium and the electronic device of data storage
CN110289041A (en) * 2019-06-25 2019-09-27 浙江大学 Memory detection device of the BIST in conjunction with ECC in a kind of System on Chip/SoC

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120110411A1 (en) * 2010-10-29 2012-05-03 Brocade Communications Systems, Inc. Content Addressable Memory (CAM) Parity And Error Correction Code (ECC) Protection
CN104616698A (en) * 2015-01-28 2015-05-13 山东华翼微电子技术股份有限公司 Method for sufficiently utilizing memory redundancy unit
CN108345430B (en) * 2017-12-27 2021-08-10 北京兆易创新科技股份有限公司 Nand flash element and operation control method and device thereof
CN110222501B (en) * 2019-05-31 2023-05-12 河南思维轨道交通技术研究院有限公司 Method for checking running code and storage medium
CN111552590B (en) * 2020-04-16 2022-09-30 国电南瑞科技股份有限公司 Detection and recovery method and system for memory bit overturning of power secondary equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232348A (en) * 2006-10-04 2008-07-30 马维尔国际贸易有限公司 Method and device for error correcting using cyclic redundancy check
CN104598342A (en) * 2014-12-31 2015-05-06 曙光信息产业(北京)有限公司 Internal storage detection method and device
CN109800104A (en) * 2018-12-18 2019-05-24 盛科网络(苏州)有限公司 Detection method, device, storage medium and the electronic device of data storage
CN110289041A (en) * 2019-06-25 2019-09-27 浙江大学 Memory detection device of the BIST in conjunction with ECC in a kind of System on Chip/SoC

Also Published As

Publication number Publication date
WO2021208341A1 (en) 2021-10-21
CN111552590A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111552590B (en) Detection and recovery method and system for memory bit overturning of power secondary equipment
TWI537967B (en) Methods and apparatus to protect segments of memory
US8677189B2 (en) Recovering from stack corruption faults in embedded software systems
KR101557572B1 (en) Memory circuits, method for accessing a memory and method for repairing a memory
US9891917B2 (en) System and method to increase lockstep core availability
US6519730B1 (en) Computer and error recovery method for the same
US8996953B2 (en) Self monitoring and self repairing ECC
US9208027B2 (en) Address error detection
JP7418397B2 (en) Memory scan operation in response to common mode fault signals
CN112053737B (en) Online parallel processing soft error real-time error detection and recovery method and system
US9934085B2 (en) Invoking an error handler to handle an uncorrectable error
US8108714B2 (en) Method and system for soft error recovery during processor execution
US7373558B2 (en) Vectoring process-kill errors to an application program
CN112559395A (en) Relay protection device and method based on dual-Soc storage system exception handling mechanism
US9329926B1 (en) Overlapping data integrity for semiconductor devices
CN113608720B (en) Single event upset resistant satellite-borne data processing system and method
US7240272B2 (en) Method and system for correcting errors in a memory device
CN113626246A (en) Single-bit overturning fast repairing method and device, computer equipment and storage medium
Garg Soft error fault tolerant systems: cs456 survey
RU2465636C1 (en) Method of correcting single errors and preventing double errors in register file and apparatus for realising said method
Kim et al. A Page-mapping Consistency Protecting Method for Soft Error Damage in Flash-based Storage
Zhai et al. A software approach to protecting embedded system memory from single event upsets
KR101173873B1 (en) Data protection and mirroring method and system thereof
CN112131034A (en) Checkpoint soft error recovery method based on detector position
SE1300783A1 (en) Handling soft errors in connection with data storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant