CN113626246A - Single-bit overturning fast repairing method and device, computer equipment and storage medium - Google Patents

Single-bit overturning fast repairing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113626246A
CN113626246A CN202111001156.8A CN202111001156A CN113626246A CN 113626246 A CN113626246 A CN 113626246A CN 202111001156 A CN202111001156 A CN 202111001156A CN 113626246 A CN113626246 A CN 113626246A
Authority
CN
China
Prior art keywords
program
checksum
original
error
subprogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111001156.8A
Other languages
Chinese (zh)
Inventor
于杨
习伟
姚浩
李肖博
姚睿
董志平
王富亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Power Grid Digital Grid Research Institute Co Ltd
Beijing Sifang Engineering Co Ltd
Original Assignee
Southern Power Grid Digital Grid Research Institute Co Ltd
Beijing Sifang Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Power Grid Digital Grid Research Institute Co Ltd, Beijing Sifang Engineering Co Ltd filed Critical Southern Power Grid Digital Grid Research Institute Co Ltd
Priority to CN202111001156.8A priority Critical patent/CN113626246A/en
Publication of CN113626246A publication Critical patent/CN113626246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1012Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

The application relates to a single-bit overturning fast repairing method and device, computer equipment and a storage medium. The method and the device mainly carry out repair by calling the interrupt response service, and compared with the scheme that a memory controller is required to be added for repair in the prior art, hardware resources are not required to be added, and the data do not need to pass through the memory controller when the processor reads the data, so that the real-time performance of the chip processor for data processing is further improved. The method comprises the following steps: after power-on, dividing an original program segment in the internal RAM into a plurality of sections of subprograms; copying the plurality of sections of subprograms as duplicate program sections; respectively solving an original program checksum for each section of copy program section, and storing the original program checksum and the copy program section in an external RAM; under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each section of subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram; and restoring the original program segment according to the program state.

Description

Single-bit overturning fast repairing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of digital processing chip technology, and in particular, to a single-bit flip fast repair method, apparatus, computer device, and storage medium.
Background
In computer chip technology, due to the influence of chip packaging materials, single-bit flip of a memory may occur. Single bit flipping means: for example, a bit in a binary bit representing a number 4 may cause a final display of $dueto a single bit flip, and the phenomenon of the single bit flip is random, errors occurring at each time are different, rewriting or resetting may be recovered to normal but not recovered during operation, and when a large amount of data is stored and processed, if resetting is always required, a large amount of time and computational resources may be wasted.
At present, the problem of single bit flipping is generally solved by adding hardware, for example, adding an EDAC (Error Detection And Correction) system structure in a memory controller, the main idea of the hardware structure is to generate a check code with a certain number of bits according to written original data when the original data is written into a memory, And store the check code with corresponding original data; when reading out, the check code is also read out at the same time, and judgment is carried out according to the check code and the read data. For example, according to the conventional parity check code, the parity check position is set to 1 according to the fact that the number of bits of 1 in original data is odd, if single bit inversion occurs, the number of bits of 1 is changed into even, and after the single bit inversion is compared with the stored parity check position, the single bit inversion can be judged to occur, automatic correction can be performed, correct data is output, corrected data is written back to cover original wrong data, and finally the correct data is transmitted to a processor for processing. All the actions are automatically completed by hardware design.
However, in the current method, hardware resource overhead needs to be increased, and each data reading needs to pass through the memory controller, so that the real-time performance of data processing is reduced.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a single-bit flipping fast repairing method, apparatus, computer device and storage medium.
A single-bit flip fast repair method, the method comprising:
after power-on, dividing an original program segment in the internal RAM into a plurality of sections of subprograms;
copying the plurality of sections of subprograms as duplicate program sections;
respectively solving an original program checksum for each copy program segment, and storing the original program checksum and the copy program segment in an external RAM;
under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each section of subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram;
and restoring the original program segment according to the program state.
In one embodiment, the invoking an interrupt response service to obtain a real-time checksum of each sub-program under the trigger of a timer, and comparing the real-time checksum with the original program checksum to obtain the program state of the sub-program includes:
calling an interrupt response service to compare the real-time checksum with the original program checksum to obtain a comparison result;
and if the comparison result is a check error, judging that the program state of the subprogram is a program error.
In one embodiment, the repairing the original program segment according to the program state includes:
if the program state is program error, acquiring the initial address of the corresponding error subprogram in the internal RAM and the length of the occupied space of the program;
comparing the error subprogram with the corresponding copy program segment field by field according to the initial address and the length of the occupied space of the program to obtain an error field;
comparing the error field with the corresponding copy field bit by bit to obtain an error bit number;
and repairing the original program segment according to the error digit.
In one embodiment, the repairing the original program segment according to the error bit number includes:
if the error digit is only one digit, turning over the corresponding error bit to obtain the error-corrected subprogram;
calling the interrupt response service to check the corrected subprogram to obtain a check result;
and if the verification results are that the program is normal within the preset time period, judging that the repair is successful.
In one embodiment, the internal RAM includes a data constant region; before the invoking the interrupt response service checks the corrected subroutine, the method further includes:
judging whether the data constant area changes or not;
if the data constant area changes, copying corresponding data of the external backup data area into the data constant area; and the external backup data area is obtained by copying data in the original data constant area to the external RAM when the external backup data area is powered on.
In one embodiment, if the error bit number comprises multiple bits, a restart instruction is triggered to restart the entire chip.
In one embodiment, when the interrupt response service executes, other programs stop running.
A single-bit flip fast repair device, the device comprising:
the subprogram segmentation module is used for dividing the original program segment in the internal RAM into a plurality of sections of subprograms after being electrified;
the copy program segment copying module is used for copying the plurality of sections of subprograms as a copy program segment;
the original program check sum solving module is used for solving an original program check sum for each section of copy program section respectively and storing the original program check sum and the copy program section in an external RAM;
the program state judgment module is used for calling an interrupt response service to obtain a real-time checksum of each section of subprogram under the trigger of a timer, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram;
and the original program segment repairing module is used for repairing the original program segment according to the program state.
A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps in the embodiment of the single-bit flipping fast repair method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the single bit flipped fast repair method embodiment as described above.
According to the single-bit overturning fast repairing method, the single-bit overturning fast repairing device, the computer equipment and the storage medium, after the chip is powered on, the original program segment in the internal RAM is divided into a plurality of sections of subprograms; copying the plurality of sections of subprograms as duplicate program sections; respectively solving an original program checksum for each section of copy program section, and storing the original program checksum and the copy program section in an external RAM; under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each section of subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram; and restoring the original program segment according to the program state. The method and the device mainly carry out repair by calling the interrupt response service, and compared with the scheme that a memory controller is required to be added for repair in the prior art, hardware resources are not required to be added, and the data do not need to pass through the memory controller when the processor reads the data, so that the real-time performance of the chip processor for data processing is further improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a single-bit rollover quick repair method according to an embodiment;
FIG. 2 is a schematic flow chart of a single-bit rollover quick repair method in another embodiment;
FIG. 3 is a schematic flow chart of a single-bit rollover quick repair method in another embodiment;
FIG. 4 is a block diagram of an embodiment of a single-bit flipped fast recovery apparatus;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The single-bit overturning rapid repairing method can be applied to ZYNQ chips in a power system, particularly CSD-602 series merging unit devices, wherein the CSD-602 series merging unit devices comprise a series of chip models, and the chips are suitable for digital substations. The device is positioned at a process layer of a transformer substation, can collect analog quantity signals of traditional current and voltage transformers and digital quantity signals of electronic current and voltage transformers, and sends Sampling Values (SV) to devices for protection, measurement and control, fault recording and the like of a bay layer in an optical-Ethernet mode according to IEC 61850-9-2. The device can also judge the positions of the disconnecting link and the breaker to complete switching or parallel functions according to the GOOSE sent by the intelligent terminal of the process layer or the on-site acquisition opening value of the device. Meanwhile, the system can communicate with other measurement and control devices and upload information such as the running states, alarms and remote signals of other measurement and control devices.
The ZYNQ chip packaging material contains alpha particles, which causes bit flipping of a high-speed RAM (Random Access Memory, also called main Memory, which is an internal Memory directly exchanging Data with a CPU) in the ZYNQ chip, and the external portion of the ZYNQ chip is a DDR (DDR SDRAM, collectively called Double Data Rate synchronous dynamic Random Access Memory) Memory, and the probability of the flipping is very low. The application mainly provides a bit flipping repairing scheme aiming at an RAM area inside a ZYNQ chip.
The RAM within the CSD602 chip is largely divided into internal RAM and external RAM as shown in the following table:
Figure BDA0003235680680000051
in one embodiment, as shown in fig. 1, a single-bit flipping fast repair method is provided, which is described by taking a processor (CPU) in a chip as an example, and includes the following steps:
step S101, after power is on, the original program segment in the internal RAM is divided into a plurality of sub programs.
The original program segment is the program code for realizing the application program written into the chip.
Specifically, after the chip is powered on, the original program segment in the memory is divided into a plurality of sub-programs.
In step S102, the plurality of sections of the subprogram are copied as a duplicate program section.
Specifically, the multi-segment subroutine is copied, and the original program segment in the original internal RAM still exists in the internal RAM as a duplicate program segment.
Step S103, respectively solving the original program check sum of each section of copy program section, and storing the original program check sum and the copy program section in an external RAM;
the original program checksum, also called program check code, is a check code calculated by a certain rule, such as a parity check code, that is, whether the number of bits in the source code of each sub-program is an odd number or an even number, or other check codes, and is not limited to the parity check code.
Specifically, the processor calls a MISR (Master Interrupt Service Routine) to obtain an original program checksum for each of the replica program segments, and stores the original program checksum and the corresponding replica program segment in the external RAM.
And step S104, under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram.
Specifically, under the trigger of the timer, the processor invokes an interrupt response service (e.g., 125us interrupt) to obtain a real-time checksum for each sub-program in the internal RAM, and compares the real-time checksum with the original program checksum stored in the external RAM to obtain the program state of the sub-program, for example, if the comparison result is inconsistent, it indicates that the sub-program in the segment is in error, and if the comparison result is consistent, it indicates that no bit flipping occurs in the sub-program in the segment.
And step S105, restoring the original program segment according to the program state.
Specifically, if the program state is a program error, the error subprogram is repaired, so as to repair the whole original program segment, where the repair method includes restarting or finding a bit to be turned over and turning over accordingly.
In the embodiment, after the chip is powered on, the original program segment in the internal RAM is divided into a plurality of sections of subprograms; copying the plurality of sections of subprograms as duplicate program sections; respectively solving an original program checksum for each section of copy program section, and storing the original program checksum and the copy program section in an external RAM; under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each section of subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram; and restoring the original program segment according to the program state. In the embodiment, the repair is mainly performed by calling the interrupt response service, and compared with a scheme in the prior art in which a memory controller needs to be added for repair, hardware resources do not need to be added, and the data does not need to pass through the memory controller when the processor reads data, so that the real-time performance of the chip processor for data processing is further improved.
In an embodiment, the step S104 includes: calling an interrupt response service to compare the real-time checksum with the original program checksum to obtain a comparison result; and if the comparison result is a check error, judging that the program state of the subprogram is a program error.
Specifically, under the trigger of the timer, the processor invokes an interrupt response service (e.g., 125us interrupt) to obtain a real-time checksum for each sub-program in the internal RAM, and compares the real-time checksum with the original program checksum stored in the external RAM to obtain the program state of the sub-program, for example, if the comparison result is inconsistent, it indicates that the sub-program in the segment is in error, and if the comparison result is consistent, it indicates that no bit flipping occurs in the sub-program in the segment.
In the above embodiment, the subprogram with bit reversal is checked by calculating the real-time checksum and comparing the real-time checksum with the stored original program checksum, so that the program state of the subprogram is obtained, and a data pad is provided for subsequent program repair.
In an embodiment, the step S105 includes: if the program state is program error, acquiring the initial address of the corresponding error subprogram in the internal RAM and the occupied space length of the program; comparing the error subprogram with the corresponding copy program segment field by field according to the initial address and the occupied space length of the program to obtain an error field, and comparing the error field with the corresponding copy field by bit to obtain an error bit number; and restoring the original program segment according to the error digit.
As shown in fig. 2, fig. 2 is a flow chart of a single-bit flipping fast repair method in another embodiment, a MISR divides an original program segment into several segments, checks each segment of a subprogram at 125us, if a segment check is problematic, returns a start address area and a length of the segment to a process in a processor for processing the application program, finds that the MISR program check is problematic, the application program process obtains a start address and a length of a faulty program segment, compares the faulty program segment with a copy program segment-by-log (a log is a field, a log is generally 4 bytes, and may also be 8 bytes in some systems) backed up in an external RAM, can find that a certain log has changed, and compares bits of the log one by one to obtain a faulty bit number, such as a bit flipping or multiple bits flipping.
According to the embodiment, the subprogram and the copy program segment are compared field by field through the interrupt response service, the error field can be accurately positioned, and subsequent accurate repair is facilitated.
In an embodiment, the repairing the original program segment according to the number of error bits includes: if the error digit is only one digit, turning over the corresponding error bit to obtain the error-corrected subprogram; calling an interrupt response service to check the error-corrected subprogram to obtain a check result; and if the verification results are normal in the program within the preset time period, judging that the repair is successful.
As shown in fig. 2, if only 1 Bit of the error program segment changes, the Bit is repaired according to the backup copy program segment, and if the Bit does not change, the program is executed normally. And continuing the following MISR program check, if the MISR check exception does not occur within a preset time period (for example, within 1 minute), the repair is proved to be successful, and if the MISR check error also occurs within the preset time period (for example, within 1 minute), the whole chip is restarted immediately.
According to the embodiment, after the repair, the program is verified again, the program does not need to be restarted every time, and the data processing efficiency of the chip is improved.
In an embodiment, the internal RAM includes a data constant region, and before the invoking interrupt response service checks the corrected subroutine, the method further includes:
judging whether the data constant area changes or not; if the data is changed, copying the corresponding data of the external backup data area into the data constant area; and the external backup data area is obtained by copying the data in the original data constant area to an external RAM when the external backup data area is powered on.
Specifically, if only 1 Bit of the program segment changes, the Bit is repaired according to the backed-up program area, and meanwhile, whether a data constant area (such as a scale coefficient, a 9-2 transmission channel and a configuration area) changes or not is judged, and if the data constant area changes, the external backup area is copied into the corresponding data constant area. If no change occurs, the normal execution of the program is continued. And the external backup data area is obtained by copying the data in the original data constant area to an external RAM when the external backup data area is powered on.
According to the embodiment, the data constant region is verified and repaired, so that the accuracy of the whole program is improved.
In one embodiment, if the number of faulty bits comprises multiple bits, a restart command is triggered to restart the entire chip.
Specifically, if more than 1 Bit changes in the error program segment, it indicates that the program area has been destroyed by the tampered program, and the whole chip is restarted.
According to the embodiment, the whole chip is restarted when the plurality of bits are overturned, so that the efficiency of program segment repair is ensured.
In one embodiment, once the MISR check exception is found, the backed up replica program segment is copied directly to the internal RAM and the external backed up data constant region is copied into the data constant region of the chip internal RAM. During repair and judgment, when the program is interrupted for 125us, all other programs stop running, including acquisition and SV (sampled value) transmission. After the program is repaired, data acquisition and SV (sampling value) receiving are firstly opened, and after 1.5ms of delay, the SV (sampling value) transmission is opened after 5 points in a real-time data area are newly acquired data. MISR or other checksum errors occur again within 1 hour, indicating a recovery failure, and the entire chip is restarted. In the merge unit program, the program related to SV (sampled value) is placed in the program area of the RAM inside the chip, and it is necessary to ensure that the program in this area is completely correct.
In an embodiment, as shown in fig. 3, fig. 3 shows a flowchart of a single-bit flipping fast repairing method in yet another embodiment, in the repairing method, the precondition is: the external RAM backup program area and the program area of the chip internal RAM are modified simultaneously, regardless of 1 hour.
After power-on initialization, copying an internal program area 0x 12460C-0 x13bFFF of the DSP to an external RAM (random access memory), performing program checksum on a backup program area (in a data area of the DDR), wherein the RAM of external data has enough space for storing the backup program area; checking whether the checksum of the backup program is correct or not every hour, if the abnormal times are more than the preset times, for example, 3 times, repairing the abnormality, and if the abnormal times are more than 3 times and still are not repaired, re-checking the chip.
According to the embodiment, hardware resource overhead does not need to be increased, and the real-time performance of data processing is improved.
It should be understood that although the various steps in the flow charts of fig. 1-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 4, there is provided a single-bit flipped fast repair apparatus 400, comprising: a subprogram segmentation module 401, a duplicate program segment copying module 402, an original program checksum obtaining module 403, a program state judgment module 404, and an original program segment repairing module 405, wherein:
the subprogram segmentation module 401 is used for dividing the original program segment in the internal RAM into a plurality of sections of subprograms after being powered on;
a duplicate program segment copying module 402, configured to copy the plurality of segments of subroutines as a duplicate program segment;
an original program checksum obtaining module 403, configured to obtain an original program checksum for each copy program segment, and store the original program checksum and the copy program segment in an external RAM;
a program state decision module 404, configured to invoke an interrupt response service to obtain a real-time checksum for each sub-program under the trigger of a timer, and compare the real-time checksum with the original program checksum to obtain a program state of the sub-program;
an original program segment repairing module 405, configured to repair the original program segment according to the program state.
In an embodiment, the program state decision module 404 is further configured to: calling an interrupt response service to compare the real-time checksum with the original program checksum to obtain a comparison result; and if the comparison result is a check error, judging that the program state of the subprogram is a program error.
In an embodiment, the original program segment repairing module 405 is further configured to, if the program state is a program error, obtain an initial address of a corresponding error subprogram in the internal RAM and a length of a program occupied space; comparing the error subprogram with the corresponding copy program segment field by field according to the initial address and the length of the occupied space of the program to obtain an error field; comparing the error field with the corresponding copy field bit by bit to obtain an error bit number; and repairing the original program segment according to the error digit.
In an embodiment, the original program segment repairing module 405 is further configured to, if the error bit number is only one bit, turn over the corresponding error bit to obtain an error-corrected subroutine; calling the interrupt response service to check the corrected subprogram to obtain a check result; and if the verification results are that the program is normal within the preset time period, judging that the repair is successful.
In one embodiment, the internal RAM includes a data constant region; the original program segment repairing module 405 is further configured to determine whether the data constant region changes; if the data constant area changes, copying corresponding data of the external backup data area into the data constant area; and the external backup data area is obtained by copying data in the original data constant area to the external RAM when the external backup data area is powered on.
In one embodiment, if the error bit number comprises multiple bits, a restart instruction is triggered to restart the entire chip.
In one embodiment, when the interrupt response service executes, other programs stop running.
For specific limitations of the single-bit flipped and quick repaired device, reference may be made to the above limitations of the single-bit flipped and quick repaired method, which are not described herein again. All or part of each module in the single-bit flipping quick repair device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store application data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a single-bit rollover quick repair method.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the foregoing single-bit flipping fast repair method embodiment when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned embodiments of the single-bit rollover quick repair method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A single-bit upset fast repair method, comprising:
after power-on, dividing an original program segment in the internal RAM into a plurality of sections of subprograms;
copying the plurality of sections of subprograms as duplicate program sections;
respectively solving an original program checksum for each copy program segment, and storing the original program checksum and the copy program segment in an external RAM;
under the trigger of a timer, calling an interrupt response service to obtain a real-time checksum of each section of subprogram, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram;
and restoring the original program segment according to the program state.
2. The method according to claim 1, wherein the calling an interrupt response service to obtain a real-time checksum for each sub-program under the trigger of a timer, and comparing the real-time checksum with the original program checksum to obtain the program status of the sub-program comprises:
calling an interrupt response service to compare the real-time checksum with the original program checksum to obtain a comparison result;
and if the comparison result is a check error, judging that the program state of the subprogram is a program error.
3. The method of claim 2, wherein the repairing the original program segment according to the program state comprises:
if the program state is program error, acquiring the initial address of the corresponding error subprogram in the internal RAM and the length of the occupied space of the program;
comparing the error subprogram with the corresponding copy program segment field by field according to the initial address and the length of the occupied space of the program to obtain an error field;
comparing the error field with the corresponding copy field bit by bit to obtain an error bit number;
and repairing the original program segment according to the error digit.
4. The method of claim 3, wherein the repairing the original program segment according to the number of errored bits comprises:
if the error digit is only one digit, turning over the corresponding error bit to obtain the error-corrected subprogram;
calling the interrupt response service to check the corrected subprogram to obtain a check result;
and if the verification results are that the program is normal within the preset time period, judging that the repair is successful.
5. The method of claim 4, wherein the internal RAM includes a data constant region; before the invoking the interrupt response service checks the corrected subroutine, the method further includes:
judging whether the data constant area changes or not;
if the data constant area changes, copying corresponding data of the external backup data area into the data constant area; and the external backup data area is obtained by copying data in the original data constant area to the external RAM when the external backup data area is powered on.
6. The method of claim 4, wherein if the number of erroneous bits comprises multiple bits, then triggering a restart instruction to restart the entire chip.
7. The method of any of claims 1 to 6, wherein other programs are stopped while the interrupt response service is executing.
8. A single-bit flip fast recovery device, the device comprising:
the subprogram segmentation module is used for dividing the original program segment in the internal RAM into a plurality of sections of subprograms after being electrified;
the copy program segment copying module is used for copying the plurality of sections of subprograms as a copy program segment;
the original program check sum solving module is used for solving an original program check sum for each section of copy program section respectively and storing the original program check sum and the copy program section in an external RAM;
the program state judgment module is used for calling an interrupt response service to obtain a real-time checksum of each section of subprogram under the trigger of a timer, and comparing the real-time checksum with the original program checksum to obtain the program state of the subprogram;
and the original program segment repairing module is used for repairing the original program segment according to the program state.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202111001156.8A 2021-08-30 2021-08-30 Single-bit overturning fast repairing method and device, computer equipment and storage medium Pending CN113626246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111001156.8A CN113626246A (en) 2021-08-30 2021-08-30 Single-bit overturning fast repairing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111001156.8A CN113626246A (en) 2021-08-30 2021-08-30 Single-bit overturning fast repairing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113626246A true CN113626246A (en) 2021-11-09

Family

ID=78388241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111001156.8A Pending CN113626246A (en) 2021-08-30 2021-08-30 Single-bit overturning fast repairing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113626246A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901174A (en) * 2010-07-28 2010-12-01 西安交通大学 Method for enhancing reliability of program of multi-replica contrast mechanism based on code segment
CN102508742A (en) * 2011-11-03 2012-06-20 中国人民解放军国防科学技术大学 Kernel code soft fault tolerance method for hardware unrecoverable memory faults
CN111552590A (en) * 2020-04-16 2020-08-18 国电南瑞科技股份有限公司 Detection and recovery method and system for memory bit overturning of power secondary equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901174A (en) * 2010-07-28 2010-12-01 西安交通大学 Method for enhancing reliability of program of multi-replica contrast mechanism based on code segment
CN102508742A (en) * 2011-11-03 2012-06-20 中国人民解放军国防科学技术大学 Kernel code soft fault tolerance method for hardware unrecoverable memory faults
CN111552590A (en) * 2020-04-16 2020-08-18 国电南瑞科技股份有限公司 Detection and recovery method and system for memory bit overturning of power secondary equipment

Similar Documents

Publication Publication Date Title
CN111552590B (en) Detection and recovery method and system for memory bit overturning of power secondary equipment
CN102270162B (en) Fault-tolerant guide method applied to SPARCV8 structure computer
KR101557572B1 (en) Memory circuits, method for accessing a memory and method for repairing a memory
TWI490876B (en) Method and apparatus of system boot and pilot process
WO2017215377A1 (en) Method and device for processing hard memory error
US10963334B2 (en) Method and computer system for fault tolerant data integrity verification of safety-related data
CN111176890A (en) Data storage and exception recovery method for satellite-borne software
US3898443A (en) Memory fault correction system
CN109933340A (en) A kind of spacecraft EEPROM in-orbit write-in and self checking method based on page mode
CN114461436A (en) Memory fault processing method and device and computer readable storage medium
US6081892A (en) Initial program load
CN1111865C (en) The initialized system and method for intelligence volatile memory
CN109086162B (en) Memory diagnosis method and device
JP4950214B2 (en) Method for detecting a power outage in a data storage device and method for restoring a data storage device
CN113626246A (en) Single-bit overturning fast repairing method and device, computer equipment and storage medium
CN114356653A (en) Power-down protection method and device for industrial control firewall
JPH10302485A (en) Information processor having flash memory
RU2327236C2 (en) Random access memory with high extent of fault tolerance
CN111104256A (en) Data reading method, device, equipment and storage medium
JP4867557B2 (en) Programmable controller
JP4078871B2 (en) Semiconductor memory system, data recovery method thereof, and data recovery program
CN117234789B (en) Verification and error correction method and device, electronic equipment and storage medium
US11809272B2 (en) Error correction code offload for a serially-attached memory device
US20170199782A1 (en) Performing a repair operation in arrays
CN110245036B (en) System and method for realizing NAND flash memory data backup processing in embedded system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination