WO2017215377A1

WO2017215377A1 - Method and device for processing hard memory error

Info

Publication number: WO2017215377A1
Application number: PCT/CN2017/083815
Authority: WO
Inventors: 张晔
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-06-16
Filing date: 2017-05-10
Publication date: 2017-12-21
Also published as: CN107516547A

Abstract

A method and device for processing a hard memory error, the method comprising: determining that a hard error failure occurs at a first address of a memory (S102); performing error correction on memory information in the first address, and storing the memory information after error correction in a failure-free second address in the memory (S104); and inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is used for monitoring whether the first address is accessed, and jumps from an access instruction for the first address to an access instruction for the second address (S106).

Description

Memory hard error processing method and device

Technical field

The present application relates to, but is not limited to, the field of communications, and in particular, to a method and apparatus for processing hard memory errors.

Background technique

In any computer system, memory is in a critical position. The storage of various data during the operation of the system, including the program itself, is stored in the memory. If the memory has an error during the running of the program, it will cause the program to go wrong, and the system will crash. Therefore, it is important to ensure the stability of memory operation. A system's memory entity is generally composed of several memory particles. There are several memory cells in the particle, and each cell stores one bit (bit) of data. When there is an error in the memory, there may be 1 bit or multiple bits. Memory errors are generally classified into soft errors and hard errors according to the cause. Soft errors occur randomly. For example, factors such as sudden occurrence of electronic interference near the memory may cause memory soft errors. Memory error with ECC (Error Checking and Correcting) check function can be detected and corrected. The hard error is caused by hardware damage or defects, so the data is always incorrect, and such errors cannot be corrected. Memory usually supports ECC checksum error correction function, which can automatically correct error in soft error of memory. It can be found but can not be corrected for hard errors. In some important embedded system application scenarios, such as core carrier-class routers/switches It always carries a large amount of user services. If there is a memory failure in the system (here, mainly a memory hard error), there is generally no other way, and the memory must be replaced again. However, this will interrupt the business for a period of time, and the consequences will be more serious.

Summary of invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a method and a device for processing a memory hard error, so as to achieve Correct the hard error of the memory in case of interrupting the service.

According to an embodiment of the present invention, a method for processing a memory hard error is provided, including: determining that a hard error of the first address of the memory occurs; correcting the memory information in the first address, and correcting the error The latter memory information is stored in the non-faulty second address in the memory; a hardware breakpoint is inserted at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed, And jumping from an access instruction to the first address to an access instruction to the second address.

In an embodiment, determining that the first address of the memory has a hard error fault comprises: receiving the error detection correction ECC interrupt signal reported by the memory; searching for the corresponding one according to the ECC interrupt signal in the ECC error capture address register First address.

In an embodiment, after searching for the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal, the method further includes: performing memory information of the first address for a predetermined number of times. And reading and writing a test to determine whether the first address is faulty; and when the test result indicates that the first address is faulty, determining that the first address has a hard error fault.

In an embodiment, jumping from the access instruction to the first address to the access instruction to the second address comprises: receiving a break triggered by the access operation of the hardware breakpoint for the first address Pointing up abnormal information; performing at least one of: the breakpoint abnormality information characterizing the memory information of the first address is instruction information, jumping from an access instruction to the instruction information for reading the first address And the access instruction to the second address; when the breakpoint abnormality information indicates that the memory information of the first address is data information, jumping from an access instruction for reading data information of the first address And an access instruction to the second address; the breakpoint abnormality information represents that the memory information of the first address is data information, and jumps from an access instruction to write data information of the first address to a pair The access instruction of the second address.

In an embodiment, jumping from the access instruction to the first address to the access instruction to the second address comprises: receiving an access instruction to the first address; calculating the first address to Deviation of the second address; correcting the deviation by the first address to obtain the second address, and jumping to an access instruction of the second address.

In an embodiment, after jumping to the second address, the method further comprises: jumping to a next instruction of an access instruction to the first address.

According to another embodiment of the present invention, a memory hard error processing apparatus is provided, including: a determining module configured to determine that a first address of a memory has a hard error fault; and an error correction module configured to be the first address The memory information is error-corrected, and the error-corrected memory information is stored in the memory-free second address; the jump module is configured to insert a hardware breakpoint at the first address, wherein The hardware breakpoint is for monitoring whether the first address is accessed and jumping from an access instruction to the first address to an access instruction to the second address.

In an embodiment, the determining module includes: a first receiving unit configured to receive the error detection correcting ECC interrupt signal reported by the memory; and a searching unit configured to be in the ECC error trapping address register according to the ECC interrupt signal Finding the corresponding first address.

In an embodiment, the determining module further includes: a testing unit, configured to: after the searching unit searches for the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal, The memory information of the first address is subjected to a predetermined number of read and write tests to determine whether the first address is faulty; and the determining unit is configured to determine that the first address has a hard error when the test result indicates that the first address is faulty .

In an embodiment, the jump module further includes: a second receiving unit, configured to receive breakpoint abnormality information triggered by the access operation of the hardware breakpoint for the first address; first jump unit, setting In order to represent the memory information of the first address when the breakpoint abnormality information is instruction information, jump from an access instruction to the instruction information that reads the first address to an access instruction to the second address. a second jump unit, configured to: when the memory information indicating the first address is data information, the at least one of: performing operation on reading data information of the first address The access instruction jumps to an access instruction to the second address; jumps from an access instruction to the data information that writes the first address to an access instruction to the second address.

According to still another embodiment of the present invention, a storage medium is also provided. The storage medium is arranged to store program code for performing the following steps:

Determining that the first address of the memory has a hard error fault;

Performing error correction on the memory information in the first address, and correcting the memory information after the error correction a second address that is stored in the memory without failure;

Inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is for monitoring whether the first address is accessed, and jumping from an access instruction to the first address to the first The address of the second address is accessed.

The embodiment of the present invention first determines that a hard error fault occurs in the first address of the memory, and then performs error correction on the memory information in the first address, and stores the error-corrected memory information in the memory. a second address of the fault, and finally a hardware breakpoint is inserted at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed and from an access instruction to the first address Jump to the access instruction to the second address. The memory information of the first address with a hard error fault in the memory is transferred to the second address of the memory, and the access instruction for accessing the original first address is jumped to the access instruction of the second address, so that the access can be accessed. The instruction of the second address, so that the memory information in the first address where the hard error occurs can be accessed without interrupting the service and replacing the memory, avoiding the uncorrectable memory storage unit, and avoiding the program error or the system crash Such serious consequences have improved the stability of the system.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a flow chart of a method for processing a memory hard error according to an embodiment of the present invention;

2 is a block diagram showing the structure of a memory hard error processing apparatus according to an embodiment of the present invention;

3 is a structural block diagram 1 of a memory hard error processing apparatus according to an embodiment of the present invention;

4 is a structural block diagram 2 of a memory hard error processing apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process of generating a device according to an embodiment of the present invention; FIG.

6 is a flow chart of creating a special hardware data breakpoint exception handling function Vector1 in FIG. 5;

7 is a flow chart of creating a special hardware instruction breakpoint exception handling function Vector2 in FIG. 5;

8 is a flow chart of the original hardware breakpoint exception handling function Vector of the modified system of FIG. 5;

9 is a flowchart of creating a memory ECC interrupt processing function vector_ecc in FIG. 5;

10 is a flow chart of an ECC check error occurring in accordance with an embodiment of the present invention;

11 is a process flow diagram of a hardware breakpoint exception occurring in accordance with an embodiment of the present invention.

Detailed

The present application will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.

It should be noted that the terms "first", "second" and the like in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or order.

Example 1

In this embodiment, a method for processing a memory hard error is provided. FIG. 1 is a flowchart of a method for processing a memory hard error according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:

Step S102, determining that a hard error occurs in the first address of the memory;

Step S104, performing error correction on the memory information in the first address, and storing the error-corrected memory information in the second address in the memory without failure;

Step S106, inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed, and jumps from the access instruction to the first address to the access instruction to the second address.

Through the above steps, first determining that the first address of the memory has a hard error fault, then correcting the memory information in the first address, and storing the error corrected memory information in the second address of the memory without failure, and finally A hardware breakpoint is inserted at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed and jumps from an access instruction to the first address to an access instruction to the second address. The memory information of the first address of the hard error fault in the memory is transferred to the second address of the memory, and the access instruction to access the original first address is jumped to the access instruction of the second address, so that the access can be accessed. The instruction of the second address, so that the memory information in the first address where the hard error occurs can be accessed without interrupting the service and replacing the memory, avoiding the uncorrectable memory storage unit, and avoiding the program error or the system crash Such serious consequences have improved the stability of the system. The execution body of the above steps may be a processor, a CPU, or Save the management unit, etc., but is not limited to this.

In an embodiment, determining that the first address of the memory has a hard error fault includes:

S11, receiving an error detection of the memory report to correct the ECC interrupt signal;

S12. Find a corresponding first address in the ECC error trap address register according to the ECC interrupt signal.

After searching for the corresponding first address in the ECC error-capture address register according to the ECC interrupt signal, in order to ensure that the fault is indeed a hard-error fault, the storage function of the first address may be detected, and after step S12, the method may further include:

S13. Perform a predetermined number of read/write tests on the memory information of the first address to determine whether the first address is faulty.

S14. When the test result indicates that the first address is faulty, determine that the first address has a hard error fault.

In an embodiment, jumping from an access instruction to the first address to an access instruction to the second address comprises:

S21. Receive a breakpoint abnormality information triggered by an access operation of the hardware breakpoint for the first address.

S22, the breakpoint abnormality information represents that the memory information of the first address is instruction information, and jumps from an access instruction to the instruction information that reads the first address to an access instruction to the second address. The breakpoint abnormality information characterizing the memory information of the first address is data information, jumping from an access instruction for reading data information of the first address to an access instruction to the second address; The breakpoint exception information characterizes the memory information of the first address as data information, jumping from an access instruction to the data information writing the first address to an access instruction to the second address.

S31. Receive an access instruction to the first address.

S32. Calculate a deviation from the first address to the second address.

S33, correcting the deviation of the first address to obtain the second address, and jumping to the access of the second address Command office.

In an embodiment, after the execution of the operation instruction to the second address is performed, the jump to the next instruction of the instruction that generates the breakpoint exception (ie, the operation instruction of the first address access) continues.

In an embodiment, the jump operation can be implemented in software by different functions, and can be constructed in the form of assembly language or binary code, and in which language (eg, C language, java, etc.) is used to construct the jump. The transfer function is not limited in this embodiment:

After the system is started, A_ok (specified fault-free special memory address) is used to save the error-corrected data D_ok.

Create a special hardware data breakpoint exception handler Vector1 for the memory address where the data information is saved. Specify three special non-faulty memories A1_code1 (Vector1 entry address), A1_code2 (corrected data read/write instruction address) and A1_stack (Vetor1 stack frame address). Put a function in the form of assembly code at A1_code1. The function content is: 1. Save the value of each register of the breakpoint exception to A1_stack. 2. Analyze the instruction code C1_old that triggers the hardware breakpoint, calculate the specified memory address A_ok that retains the correct data, and the memory address A_error deviation of the ECC error. Correct and recreate the instruction code C1_new1 with this deviation (corrected data read and write) instruction). 3, then put the newly created instruction C1_new1 at a specified memory address A1_code2. 4. After the new instruction C1_new1, add a branch code C1_new2 that jumps to the next assembly code address (C_old+C_old length) of the trigger instruction breakpoint instruction C_old. 5, recover the breakpoint exception field register values from the address A1_stack . 6, jump to the newly created instruction C1_new1.

Create a special hardware instruction breakpoint exception handler Vector2 for the memory address where the instruction information is saved. After the system starts, specify two special memory A2_code (Vector2 entry address) and A2_stack (Vetor2 stack frame address). Put a function in binary code form at A2_code, the function content is: 1, the specified correct data storage address A_ok (specified non-faulty special memory address) and then add the sink code C2_new (corrected A_error at the original Program instruction), the sink code C2_new is such an instruction: jump from the address of the sink code C2_new (the length of the sink code represented by the data D_ok at the A_ok+A_ok address) to the next sink code address of the trigger data breakpoint command C_old (C_old+C_old length). 2, from the address A2_stack restores the register values of the breakpoint exception field. 3. Jump to A_ok.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present application can be embodied in the form of a software product stored in a storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), and includes a plurality of instructions for making a terminal. The device (which may be a cell phone, computer, server, or network device, etc.) performs the methods of various embodiments of the present invention.

Example 2

In the embodiment, a memory hard error processing device is also provided, which is used to implement the foregoing embodiments and implementation manners, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the devices described in the following embodiments may be implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.

2 is a structural block diagram of a memory hard error processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes:

The determining module 20 is configured to determine that the first address of the memory has a hard error fault;

The error correction module 22 is configured to perform error correction on the memory information in the first address, and store the error-corrected memory information in the second address in the memory without failure;

The jump module 24 is configured to insert a hardware breakpoint at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed, and jump from the access instruction to the first address to the second address Access to the instruction.

FIG. 3 is a block diagram showing the structure of a memory hard error processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the determining module 20 includes:

The first receiving unit 30 is configured to receive an error detection correction ECC interrupt signal reported by the memory;

The searching unit 32 is configured to look up the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal.

In another embodiment, the determining module 20 further includes:

The test unit 34 is configured to: after the search unit searches for the corresponding first address in the ECC error trap address register according to the ECC interrupt signal, perform a predetermined number of read and write tests on the memory information of the first address to determine whether the first address is faulty. ;

The determining unit 36 is configured to determine that a hard error fault occurs at the first address when the test result indicates that the first address is faulty.

4 is a structural block diagram 2 of a memory hard error processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the device includes, in addition to all the modules shown in FIG. 2, the jump module 24 further includes:

The second receiving unit 40 is configured to receive breakpoint abnormality information triggered by an access operation of the hardware breakpoint for the first address;

The first jump unit 42 is configured to jump from the access instruction to the instruction information for reading the first address to the second when the memory information indicating the first address of the breakpoint abnormality information is instruction information Address access instruction;

The second jump unit 44 is configured to: when the memory information indicating that the first address is the data information, the at least one of: performing an operation from the access instruction for reading the data information of the first address Up to an access instruction to the second address; jumping from an access instruction to the data information writing the first address to an access instruction to the second address.

It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the above modules are in any combination. The forms are located in different processors.

Example 3

The embodiment relates to a method for ensuring normal operation of the system under the premise of ensuring that the system does not restart when the storage unit in the memory granules has irreparable hardware damage in the embedded system.

The embodiment provides a method and device: when an uncorrectable hard error occurs in the memory of the embedded system, the system can continue to operate normally without restarting the system, and the data communication product can be significantly improved in the market application. Stability.

In the embedded CPU architecture, the memory controller supports ECC checksum error correction capabilities, as long as I The memory used by us also supports the ECC check function, which can report an ECC interrupt to the CPU when an error occurs in the memory. After the ECC error occurs, the CPU gives the error address in the ECC error capture address register. At this time, the operating system can handle this ECC interrupt accordingly. In addition, the CPU provides a hardware breakpoint function to monitor whether it is reading/writing the specified memory address. This type of read/write includes data read/write and instruction read, either of them, once The specified memory address is read/written and an exception is reported to the operating system. This application utilizes the above two functions provided by the CPU. After the ECC interrupt occurs, the correct data is first calculated into a specific memory address where no ECC error occurs, and then the hardware is inserted at the memory address where the ECC error occurs. Breakpoint. There are two kinds of hardware breakpoints: one is the instruction breakpoint, which is a breakpoint exception triggered when the CPU fetches the memory. When the breakpoint is abnormal, the address of the instruction itself is equal to the breakpoint address. One is the data breakpoint, which is when the CPU performs a read/write data operation on the address memory, which triggers a breakpoint exception. The result of the breakpoint exception is that the object address of the instruction operation is equal to the breakpoint address. Once the subsequent program triggers the memory address again, it enters a special processing flow to circumvent the wrong memory address. Among them, if the data breakpoint is triggered, let the program read/write the address of the correct data that has been corrected; if the instruction breakpoint is triggered, it will jump directly to the address of the corrected correct data. This way the system can ignore hard errors in memory. With this device, a device that is very sensitive to service stability, such as a carrier-grade data communication product, can continue to operate without being affected by the service even if a fatal hardware failure such as a memory hard error occurs.

FIG. 5 is a schematic diagram of a process of generating a device according to an embodiment of the present invention. As shown in FIG. 5, the implementation steps of this embodiment include:

In step S501, the system starts.

In step S502, A_ok (specified non-faulty special memory address) is used to save the error-corrected data D_ok.

Step S503, creating a special hardware data breakpoint exception handling function Vector1.

Wherein, referring to FIG. 6, the following steps are included:

In step S601, three special non-faulty memories A1_code1 (Vector1 entry address), A1_code2 (corrected data read/write instruction address) and A1_stack (Vetor1 stack frame address) are specified.

Step S602, placing a function in the form of assembly code at A1_code1, the function content is: 1, saving the register values of the breakpoint exception field to A1_stack. 2. Analyze the instruction code C1_old that triggers the hardware breakpoint, calculate the specified memory address A_ok that retains the correct data, and the memory address A_error deviation of the ECC error. Correct and recreate the instruction code C1_new1 with this deviation (corrected data read and write) instruction). 3, then put the newly created instruction C1_new1 at a specified memory address A1_code2. 4. After the new instruction C1_new1, add a branch code C1_new2 that jumps to the next assembly code address (C_old+C_old length) of the trigger instruction breakpoint instruction C_old. 5, recover the breakpoint exception field register values from the address A1_stack . 6, jump to the newly created instruction C1_new1.

In step S603, Vector1 is created.

Step S504, creating a special hardware instruction breakpoint exception handling function Vector2.

Wherein, referring to FIG. 7, the following steps are included:

Step S701, after the system is started, two special memory A2_code (Vector2 entry address) and A2_stack (Vetor2 stack frame address) are specified.

Step S702, placing a function in the form of a binary code at A2_code, the function content is: 1, the specified correct data storage address A_ok (specified non-faulty special memory address) is added to the sink code C2_new (corrected A_error) The original program instruction), the sink code C2_new is such an instruction: jump from the address of the sink code C2_new (the length of the sink code represented by the data D_ok at the address of A_ok+A_ok) to the next one of the trigger data breakpoint command C_old The code address (C_old+C_old length). 2. Restore the register values of the breakpoint exception field from the address A2_stack. 3. Jump to A_ok.

In step S703, Vector2 is created.

Step S505, modifying the original hardware breakpoint exception processing function Vector of the system.

Wherein, referring to FIG. 8, the following steps are included:

Step S801, determining whether the currently interrupted instruction is that the CPU has accessed the faulty memory address A_error, and if so, continues to determine whether the hardware breakpoint is an instruction breakpoint or a data breakpoint. If it is a data breakpoint, it jumps directly to the special data breakpoint exception handling function Vector1 created in step S503. If it is determined that the hardware breakpoint is an instruction breakpoint, then jump to the special created in step S504. The special hardware instruction breakpoint processing function Vector2. If the currently interrupted instruction is independent of the failed memory address A_error, it is executed according to the normal hardware breakpoint exception handling function Vector.

In step S802, the Vector is modified.

In step S506, a memory ECC interrupt processing function vector_ecc is created.

Wherein, referring to FIG. 9, the following steps are included:

Step S901, after the CPU reports an ECC check error, it is determined whether it is a true hard error, and the soft error is excluded. The method is: in the ECC interrupt handler, the memory address A_error of the error occurrence is obtained by the ECC error capture address register of the memory controller, and the data D_error is obtained by the ECC error capture data register of the memory controller, and is obtained by the ECC symptom register of the memory controller. Symptom code D_syndrome and translate to get which bit is faulty, calculate the correct data D_ok. Then, the memory address A_error with the ECC error is subjected to a certain number of 0/1 read/write tests to determine whether the memory address is still faulty. If the fault does not occur, it is judged as a soft error, and the previously calculated data D_ok is written back to the address. After A_error, the ECC interrupt processing is exited and the process ends. A fault is confirmed if the fault persists. Save the correct data D_ok to the specified special address A_ok. Make a hardware breakpoint at the faulty memory address A_error. Since a memory address may store data or store code, the address of the stored code may also be treated as data access. Therefore, A1 must simultaneously issue the instruction breakpoint Bc and the data breakpoint Bd.

In step S902, the vector_ecc is modified.

In step S507, the device is created.

The above steps S501 to S507 are the generation of the device or the creation process of the software.

FIG. 10 is a flowchart of a process in which an ECC check error occurs according to an embodiment of the present invention, including the following steps:

In step S1001, an ECC check error occurs.

In step S1002, the ECC interrupt processing vector_ecc is entered, and the A_error address is tested.

In this step, the data D_error is obtained through the ECC error capture data register of the memory controller, and the symptom code D_syndrome is obtained through the ECC symptom register of the memory controller and translated to obtain which bit is faulty, and the correct data D_ok is calculated.

Step S1003, performing a certain number of 0/1 readings on the memory address A_error where the ECC error occurs. The write test determines whether the memory address is still faulty. If the fault does not occur, it is judged as a soft error. The previously calculated data D_ok is written back to the address A_error and then exits the ECC interrupt processing, and step S1006 is performed. A fault is confirmed if the fault persists.

In step S1004, the obtained correct data D_ok is saved to the specified special address A_ok.

In step S1005, a hardware breakpoint is made at the fault memory address A_error.

In step S1006, the process ends.

11 is a flowchart of a process in which a hardware breakpoint exception occurs, including the following steps, according to an embodiment of the present invention:

In step S1101, a hardware breakpoint exception occurs.

In step S1102, the hardware breakpoint exception processing function Vector is entered to determine the cause of the interruption.

In step S1103, it is determined whether the currently interrupted instruction is that the CPU has accessed the faulty memory address A_error, and if so, step S1105 is performed, and if no, step S1104 is performed.

Step S1104 is performed according to the normal hardware breakpoint exception processing function Vector, and step S1110 is performed.

In step S1105, it is determined whether the hardware breakpoint cause is an instruction breakpoint or a data breakpoint. If it is a data breakpoint, step S1108 is performed, and if it is an instruction breakpoint, step S1106 is performed.

Step S1106, the special hardware instruction breakpoint processing function Vector2.

In step S1107, the instruction C2_new is created and jumped to A_ok, and step S1110 is performed.

In step S1108, a special data breakpoint exception handling function Vector1 is entered.

In step S1109, new data access instructions C2_new1 and C2_new2 are created and jump to C2_new1.

In step S1110, the program returns to the original process and continues to execute.

As shown in Figures 10 and 11, through the above device, if a memory hard error occurs during system operation, the wrong memory address will be redirected to a good non-fault memory address, when the CPU accesses the failed memory address. Turning to access the redirected memory address, avoiding the non-correctable memory storage unit, avoiding serious consequences such as program error or system crash, and improving the stability of the system.

Example 4

Embodiments of the present invention also provide a storage medium. In this embodiment, the above storage medium may be configured to store program code for performing the following steps:

S1, determining that the first address of the memory has a hard error fault;

S2, performing error correction on the memory information in the first address, and storing the error-corrected memory information in the second address of the memory without failure;

S3, inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is used to monitor whether the first address is accessed, and jumps from an access instruction to the first address to an access instruction to the second address.

In this embodiment, the foregoing storage medium may include, but not limited to, a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, a magnetic disk, or an optical disk. A variety of media that can store program code.

In this embodiment, the processor performs a hard error failure to determine the first address of the memory according to the stored program code in the storage medium;

In this embodiment, the processor performs error correction on the memory information in the first address according to the stored program code in the storage medium, and stores the error-corrected memory information in the second address in the memory without failure;

In this embodiment, the processor performs a hardware breakpoint insertion at the first address according to the stored program code in the storage medium, wherein the hardware breakpoint is used to monitor whether the first address is accessed, and from the first address The access instruction jumps to the access instruction to the second address.

For examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and implementation manners, and details are not described herein again.

The modules or steps of the above embodiments of the present invention may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices, which may be implemented by computing devices. The executed program code is implemented such that they can be stored in a storage device by a computing device, and in some cases, the steps shown or described can be performed in a different order than here, or they can be Separately made into individual integrated circuit modules, or make multiple modules or steps of them into a single integrated circuit module achieve. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The above description is only for the embodiments of the present invention, and is not intended to limit the present application, and various changes and modifications may be made by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application are intended to be included within the scope of the present application.

Industrial applicability

According to the embodiment of the present invention, the memory information in the first address where the hard error occurs can be accessed without interrupting the service and replacing the memory, and the memory unit that is not error-corrected is avoided, thereby avoiding the program error or the system crash. Serious consequences have improved the stability of the system.

Claims

A method for handling memory hard errors, including:

Determining that the first address of the memory has a hard error fault;

Performing error correction on the memory information in the first address, and storing the error-corrected memory information in the second address in the memory without failure;

Inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is for monitoring whether the first address is accessed, and jumping from an access instruction to the first address to the first The address of the second address is accessed.
The method of claim 1 wherein said determining a first address of the memory with a hard error fault comprises:

Receiving the error detection correction ECC interrupt signal reported by the memory;

Finding the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal.
The method of claim 2, wherein after the finding the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal, the method further comprises:

Performing a predetermined number of read and write tests on the memory information of the first address to determine whether the first address is faulty;

When the test result indicates that the first address is faulty, it is determined that the first address has a hard error fault.
The method of claim 1, wherein the jump from an access instruction to the first address to an access instruction to the second address comprises:

Receiving breakpoint abnormality information triggered by the hardware breakpoint for the access operation of the first address;

Do at least one of the following:

The breakpoint abnormality information represents that the memory information of the first address is instruction information, and jumps from an access instruction to the instruction information that reads the first address to an access instruction to the second address;

The breakpoint abnormality information represents that the memory information of the first address is data information, from the reading site The access instruction of the data information of the first address jumps to the access instruction to the second address;

The breakpoint abnormality information represents that the memory information of the first address is data information, and jumps from an access instruction to the data information of the first address to an access instruction to the second address.
The method of claim 1, wherein the jump from an access instruction to the first address to an access instruction to the second address comprises:

Receiving an access instruction to the first address;

Calculating a deviation of the first address to the second address;

Correcting the deviation by the first address to obtain the second address, and jumping to the access instruction of the second address.
The method of claim 5, wherein after jumping to the second address, the method further comprises:

Jumps to the next instruction of the access instruction to the first address.
A memory hard error processing device, comprising:

Determining the module, setting a hard error to the first address of the memory;

An error correction module is configured to perform error correction on the memory information in the first address, and store the error-corrected memory information in a second address in the memory that is not faulty;

a jump module configured to insert a hardware breakpoint at the first address, wherein the hardware breakpoint is configured to monitor whether the first address is accessed and jump from an access instruction to the first address Go to the access instruction to the second address.
The apparatus of claim 7, wherein the determining module comprises:

a first receiving unit, configured to receive an error detection correction ECC interrupt signal reported by the memory;

The searching unit is configured to search for the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal.
The apparatus of claim 8, wherein the determining module further comprises:

a testing unit, configured to: after the searching unit searches for the corresponding first address in the ECC error trapping address register according to the ECC interrupt signal, Store the information for a predetermined number of read and write tests to determine whether the first address is faulty;

The determining unit is configured to determine that the first address has a hard error fault when the test result indicates that the first address is faulty.
The apparatus of claim 7, wherein the jump module further comprises:

a second receiving unit, configured to receive breakpoint abnormality information triggered by an access operation of the hardware breakpoint for the first address;

a first jump unit, configured to jump from an access instruction to the instruction information for reading the first address to the opposite location when the memory information indicating the first address is the instruction information The access instruction of the second address;

a second jump unit, configured to: when the memory information indicating the first address is the data information, the at least one of: performing an access instruction from the data information of the first address Jumping to an access instruction to the second address; jumping from an access instruction to the data information writing the first address to an access instruction to the second address.
A storage medium configured to store program code for performing the following steps:

Determining that the first address of the memory has a hard error fault;

Performing error correction on the memory information in the first address, and storing the error-corrected memory information in the second address in the memory without failure;

Inserting a hardware breakpoint at the first address, wherein the hardware breakpoint is for monitoring whether the first address is accessed, and jumping from an access instruction to the first address to the first The address of the second address is accessed.