CN117312037A - Memory repair method and device, electronic equipment and storage medium - Google Patents

Memory repair method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117312037A
CN117312037A CN202311297629.2A CN202311297629A CN117312037A CN 117312037 A CN117312037 A CN 117312037A CN 202311297629 A CN202311297629 A CN 202311297629A CN 117312037 A CN117312037 A CN 117312037A
Authority
CN
China
Prior art keywords
repaired
unit
memory
repair
ppr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311297629.2A
Other languages
Chinese (zh)
Inventor
高静
李琛琛
葛士建
汤俊良
袁勇
彭亮
王峰
张宇
王剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202311297629.2A priority Critical patent/CN117312037A/en
Publication of CN117312037A publication Critical patent/CN117312037A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present disclosure relates to a memory repair method, a device, an electronic apparatus, and a storage medium, where the method includes obtaining a unit to be repaired in a memory; acquiring repair time of the unit to be repaired, wherein the repair time comprises the memory operation or system restarting; and based on the repair time, calling a target PPR module through a PPR interface to repair the unit to be repaired. Therefore, the limitation of a Firmware detection program can be broken through, more detection modes can be adopted to detect and repair faults in the memory when the equipment is operated, and the dependence on system restarting can be reduced when the memory is repaired, so that the memory repairing efficiency can be improved to a great extent.

Description

Memory repair method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a memory repair method, a memory repair device, an electronic device and a storage medium.
Background
The memory is one of the important components of the server and other devices, and in the process of using the memory, faults often occur due to various reasons, and then the memory bank needs to be replaced or the faults in the memory need to be repaired.
In the related art, when repairing a fault in a memory, a PPR (Post Package Repair, post-package repair) manner is generally adopted to repair the fault in the memory. However, the PPR is highly dependent on the Firmware detection procedure, and only the fault detected by the Firmware detection procedure is repaired, and when the PPR repairs the fault, the PPR needs to rely on the system to restart, so that the repair efficiency of the PPR to the memory fault is low.
Disclosure of Invention
The disclosure provides a memory repair method, a memory repair device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a memory repair method, including:
obtaining a unit to be repaired in a memory;
acquiring repair time of the unit to be repaired, wherein the repair time comprises the memory operation or system restarting;
and based on the repair time, calling a target PPR module through a PPR interface to repair the unit to be repaired.
According to another aspect of the present disclosure, there is provided a memory repair apparatus, including:
the unit to be repaired obtaining module is used for obtaining the unit to be repaired in the memory;
the repair time acquisition module is used for acquiring repair time of the unit to be repaired, wherein the repair time comprises the memory operation or system restarting;
and the repair module is used for calling a target PPR module through a PPR interface to repair the unit to be repaired based on the repair time.
According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method as described above when executing the program.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described method of the present disclosure.
According to the memory repair method, the memory repair device, the electronic equipment and the storage medium, the unit to be repaired in the memory is obtained, the repair time of the unit to be repaired is obtained, and the target PPR module can be called through the PPR interface to repair the unit to be repaired based on the repair time. Therefore, the limitation of a Firmware detection program can be broken through, more detection modes can be adopted to detect and repair faults in the memory when the equipment is operated, and the dependence on system restarting can be reduced when the memory is repaired, so that the memory repairing efficiency can be improved to a great extent.
Drawings
Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:
FIG. 1 is a flow chart of a memory repair method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic block diagram of functional modules of a memory repair device according to an exemplary embodiment of the present disclosure;
FIG. 3 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure;
fig. 4 is a block diagram of a computer system according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window. It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
When the memory of the server and other devices fails, the simplest processing mode is to replace the memory bank. For the case that only part of memory particles in the memory have faults, the problem of memory faults is solved by replacing the memory strips, so that serious waste of resources is caused.
Therefore, the memory with the memory failure can be repaired by using the PPR mode, and the problem of the memory failure can be solved by reserving a plurality of redundant spaces in the memory. Specifically, a fault unit with an error in a memory can be found in the system starting process, and a redundant idle unit can be used for replacing the fault unit so as to complete automatic fault repair. Currently, almost all mainstream memory granule manufacturers support this function in the normal DDR (Double Data Rate), double power synchronous dynamic random Access memory) 4/DDR 5.
However, the PPR in the related art requires a detection procedure highly dependent on Firmware, and only repairs a fault detected by the detection procedure, and the detected fault does not necessarily exactly match a fault unit observed at a system level. Meanwhile, the execution of PPR depends on the restart of the machine, and other execution paths are absent.
Therefore, the embodiment of the disclosure can better locate the fault unit in the memory by defining the standard interface of the PPR standard and combining with a specific fault detection and analysis means, and call the PPR through the defined PPR interface, so that the fault unit can be repaired in the running process of the memory without depending on the detection program of Firmware.
In this embodiment, the failed unit in the memory may be referred to as a unit to be repaired, where the unit to be repaired may predict a unit in the memory that is about to fail through data, or may determine a failed unit that has occurred in the memory through failure information. When predicting the unit to be repaired in the memory, the error log with the problem can be screened out by acquiring the log of the large-scale cluster and analyzing the large-scale log.
For example, an error log in a period of history can be obtained, a fault unit with a fault in a subsequent memory is obtained, a connection between the error log and the fault unit is established, for example, the error log can be used as a sample, fault information of a corresponding fault unit is used as a label of the sample, a preset model is trained through the sample, training is stopped under the condition that the model reaches a training stop condition, a trained preset model is obtained, and the trained preset model can be used as a memory fault prediction model.
And inputting the error log in the current period into the memory fault prediction model, so that the unit to be repaired, which is about to generate the memory fault, can be predicted. For example, address information, fault type or fault state and other information of a unit to be repaired can be predicted through a memory fault prediction model, the unit to be repaired which is about to generate faults is obtained in advance, and the PPR module is called through a pre-defined PPR interface to repair the unit to be repaired, so that the influence of the memory faults on the normal operation of equipment can be avoided.
When the fault unit which has occurred in the memory is determined through the fault information, the fault information of the memory can be obtained by obtaining data such as a log of the equipment. For example, faults in memory may be identified by collecting in-band or out-of-band logs and by parsing the service. For memory failures, the memory address resolution tool may be used to translate the errant system address into a specific failure unit, thereby providing refined failure localization. Compared with the detection program in Firmware, the method can predict or detect the memory faults by more effective related technical means, is not limited to the fault detection mode of the detection program in Firmware, and can improve the prediction and detection efficiency.
In the embodiment, when the PPR interface is defined, parameters of the PPR interface and data types of the parameters are defined, and a specific description of the parameters is given. Thus, the PPR interface may comprise: the parameters, data types, and specific descriptions of parameters, for example, the parameters may include an address parameter, a type parameter, and a status parameter, the address parameter representing an address of a unit to be repaired; the type parameter may be a fault type, and may include, for example, a correctable error type or an uncorrectable error type, etc.; the status parameters may include to-be-repaired or repaired, etc.
In the embodiment, a memory or a plurality of units to be repaired corresponding to a plurality of memories can be obtained, the units to be repaired can be repaired in batches, and in the batch repairing process, the units to be repaired can be repaired in series in a single-thread mode. For example, parameters such as an address of the unit to be repaired can be obtained successively, and repair work of the unit to be repaired can be completed one by one.
When a unit to be repaired, which needs to be repaired, appears in the content, is predicted or detected in the manner, the unit to be repaired may have failed or is about to fail. Since the device may be still in operation at this time, a prompt may be generated, for example, to prompt the user if a fault repair is required at runtime.
Upon receiving the prompt, the user may perform a selection operation, for example, select to perform fault repair at runtime, or select not to perform fault repair at runtime, and if no relevant selection operation is received by the user, may default to not perform fault repair at runtime. The operation in the embodiment refers to the operation of the memory, for example, the memory is in a working state.
When the user selects to perform fault repair in the running process, the memory is in the running state, so as to avoid further damage caused by faults in the memory, and avoid possible influence on the memory operation when performing the memory repair in the running process of the memory, and the fault unit in the memory can be isolated. Fault information is recorded at the same time, and the fault information may include fault type, address of the fault unit, state of the fault unit, and the like. It should be noted that, in the embodiment, the unit to be repaired includes a failure unit. And under the condition that the equipment (such as a server) with the memory is restarted, the PPR module can be called through a pre-defined PPR interface according to the recorded fault information to repair the fault unit.
When the user selects not to perform fault repair in operation, fault information can be recorded first, and when the restarting of the equipment in the memory is detected, the PPR module can be called through a pre-defined PPR interface according to the recorded fault information to repair the fault unit.
Based on the foregoing embodiments, in still another embodiment provided by the present disclosure, there is further provided a memory repair method, as shown in fig. 1, including the following steps:
in step S110, a unit to be repaired in the memory is acquired.
In an embodiment, the unit to be repaired may include one or both of a unit in the memory that may be about to fail and a unit in the memory that has failed.
For the unit which may be about to fail in the memory, the unit to be repaired which is about to fail in the memory can be realized in a prediction mode according to the failure information. Specifically, error log data of the server cluster may be obtained, and a unit to be repaired in the memory may be predicted based on the error log data. For example, by acquiring logs of a large-scale cluster, analyzing the large-scale logs, screening out error log data with problems, and predicting units to be repaired in the memory based on the error log. Reference may be made specifically to the description of the above embodiments, and no further description is given here.
For the unit to be repaired, which has failed in the memory, failure information in the memory can be obtained, the failure unit in the memory is determined based on the failure information, and the failure unit is used as the unit to be repaired. Faults in memory may be identified, for example, by collecting in-band or out-of-band logs and by parsing the service. And the error system address can be converted into a specific fault unit by means of a memory address analysis tool, so that fine fault positioning is provided.
In step S120, a repair opportunity of the unit to be repaired is obtained, where the repair opportunity includes memory operation or system restart.
In an embodiment, a repair opportunity of a unit to be repaired may be obtained, for example, whether the current environment repairs the unit to be repaired in the memory operation process or repairs the unit to be repaired when the system is restarted may be detected.
For example, the current operation state of the memory, such as whether the operation load of the memory is large, whether the memory is executing a relatively important task, whether the occupancy rate of the memory is greater than a threshold value, etc., may be detected, so as to determine the repair opportunity based on the operation state of the memory.
In addition, in the embodiment, prompt information can be sent to the user, and the repairing opportunity is determined based on the selection operation by receiving the selection operation of the user.
Specifically, user prompt information about whether the unit to be repaired needs to be repaired in the memory running process can be generated, selection operation for the user prompt information is received, and whether the unit to be repaired needs to be repaired in the memory running process is determined based on the selection operation. When the operation is selected to be that the unit to be repaired is not needed to be repaired in the memory operation process, the unit to be repaired is repaired in the system restarting process.
In step S130, based on the repair opportunity, the target PPR module is invoked to repair the unit to be repaired through the PPR interface.
In an embodiment, a target interface of a target PPR module may be created, the target interface comprising a type parameter, a status parameter and an address parameter of the unit to be repaired. The PPR module can be flexibly called by creating the PPR interface, the limit of a Firmware detection program can be broken through, and the system can be independent of restarting when the unit to be repaired in the memory is repaired.
According to the memory repair method provided by the embodiment of the disclosure, the unit to be repaired in the memory is obtained, the repair time of the unit to be repaired is obtained, and the target PPR module can be called through the PPR interface to repair the unit to be repaired based on the repair time. Therefore, the limitation of a Firmware detection program can be broken through, more detection modes can be adopted to detect and repair faults in the memory when the equipment is operated, and the dependence on system restarting can be reduced when the memory is repaired, so that the memory repairing efficiency can be improved to a great extent.
Based on the above embodiment, in still another embodiment provided by the present disclosure, when the repair opportunity is the memory operation, the step S130 may specifically further include the following steps:
step S131, obtaining address information of the unit to be repaired.
And step S132, performing isolation processing on the unit to be repaired based on the address information, and recording fault information of the unit to be repaired.
And step S133, when the server in which the memory exists is detected to be restarted, the target PPR module is called through the PPR interface to repair the unit to be repaired based on the fault information and the address information.
In an embodiment, when a unit to be repaired in a memory can be repaired in the memory operation process based on the current operation state of the memory, or when a user selects to perform fault repair in the operation process, isolation processing can be performed on a fault unit in the memory. Fault information is recorded simultaneously, which may include at least one of a fault type, an address of a fault unit, a state of the fault unit, and the like. It should be noted that, in the embodiment, the unit to be repaired includes a failure unit. And under the condition that the equipment (such as a server) with the memory is restarted, the PPR module can be called through a pre-defined PPR interface according to the recorded fault information to repair the fault unit.
Based on the above embodiment, in still another embodiment provided by the present disclosure, when the repair opportunity is the memory operation, the step S130 may specifically further include the following steps:
step S134, address information of the unit to be repaired and fault information of the unit to be repaired are obtained.
And step S135, when the server in which the memory exists is detected to restart, the target PPR module is called through the PPR interface to repair the unit to be repaired based on the fault information and the address information.
In an embodiment, for example, when a unit to be repaired in a memory cannot be repaired in a memory operation process based on a current operation state of the memory, and repair is required when the system is restarted, or when a user selects not to perform fault repair in operation, fault information can be recorded first, and when the device where the memory exists is detected to be restarted, a PPR module can be called through a predefined PPR interface according to the recorded fault information to repair the fault unit.
In an embodiment, after the failure information of the memory is obtained, the failure information may be parsed, and a failure type corresponding to the failure of the memory is determined based on the parsed failure information. In actual practice, the fault types may include correctable errors and uncorrectable errors.
If the failure information of the memory is a correctable error state, occurrence time, number of errors, physical address information, etc., it may be determined that the failure occurring in the memory may be a correctable error. If the failure information of the memory is an uncorrectable error state, occurrence time, number of errors, physical address information, etc., then it may be determined that the memory has failed as an uncorrectable error.
In the case of dividing each functional module by adopting corresponding each function, the embodiments of the present disclosure provide a memory repair device, which may be a server or a chip applied to the server. Fig. 2 is a schematic block diagram of functional modules of a memory repair device according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the memory repair device includes:
the unit to be repaired obtaining module 10 is used for obtaining the unit to be repaired in the memory;
a repair opportunity obtaining module 20, configured to obtain a repair opportunity of the unit to be repaired, where the repair opportunity includes the memory running or the system restarting;
and the repair module 30 is configured to invoke a target PPR module through a PPR interface to repair the unit to be repaired based on the repair opportunity.
In yet another embodiment provided in the present disclosure, the unit to be repaired obtaining module is specifically configured to:
obtaining error log data of a server cluster;
and predicting a unit to be repaired in the memory based on the error log data.
In yet another embodiment provided in the present disclosure, the unit to be repaired obtaining module is specifically configured to:
acquiring fault information in the memory;
and determining a fault unit in the memory based on the fault information, and taking the fault unit as the unit to be repaired.
In yet another embodiment provided by the present disclosure, the repair opportunity acquisition module is specifically configured to:
generating user prompt information of whether the unit to be repaired needs to be repaired in the memory operation process;
receiving a selection operation aiming at the user prompt information, and determining whether the unit to be repaired needs to be repaired when the memory runs or not based on the selection operation; and when the selection operation is that the unit to be repaired is not needed to be repaired in the memory operation process, repairing the unit to be repaired in the system restarting process.
In yet another embodiment provided by the present disclosure, the apparatus further comprises:
the interface creation module is used for creating a target interface of the target PPR module; the target interface comprises type parameters, state parameters and address parameters of the unit to be repaired.
In another embodiment provided in the present disclosure, when the repair opportunity is the memory operation, the repair module is specifically configured to:
acquiring address information of the unit to be repaired;
performing isolation processing on the unit to be repaired based on the address information, and recording fault information of the unit to be repaired;
and under the condition that the server where the memory exists is detected to restart, a target PPR module is called through a PPR interface based on the fault information and the address information to repair the unit to be repaired.
In yet another embodiment provided by the present disclosure, in a case where the repair opportunity is a system restart, the repair module is specifically configured to:
acquiring address information of the unit to be repaired and fault information of the unit to be repaired;
and under the condition that the server where the memory exists is detected to restart, a target PPR module is called through a PPR interface to repair the unit to be repaired based on the fault information and the address information.
According to the memory repair device provided by the embodiment of the disclosure, the unit to be repaired in the memory is obtained, the repair time of the unit to be repaired is obtained, and the target PPR module can be called through the PPR interface to repair the unit to be repaired based on the repair time. Therefore, the limitation of a Firmware detection program can be broken through, more detection modes can be adopted to detect faults in the memory, and the dependence on system restarting can be reduced when the memory is repaired, so that the memory repairing efficiency can be improved to a great extent.
The embodiment of the disclosure also provides an electronic device, including: at least one processor; a memory for storing the at least one processor-executable instruction; wherein the at least one processor is configured to execute the instructions to implement the above-described methods disclosed by embodiments of the present disclosure.
Fig. 3 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the electronic device 1800 includes at least one processor 1801 and a memory 1802 coupled to the processor 1801, the processor 1801 may perform corresponding steps in the above-described methods disclosed by embodiments of the present disclosure.
The processor 1801 may also be referred to as a central processing unit (central processing unit, CPU), which may be an integrated circuit chip with signal processing capabilities. The steps of the above-described methods disclosed in the embodiments of the present disclosure may be accomplished by instructions in the form of integrated logic circuits or software in hardware in the processor 1801. The processor 1801 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), an ASIC (Application Specific Integrated Circuit ), an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may reside in a memory 1802 such as random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as is well known in the art. The processor 1801 reads the information in the memory 1802 and, in combination with its hardware, performs the steps of the method described above.
In addition, various operations/processes according to the present disclosure, when implemented by software and/or firmware, may be installed from a storage medium or network to a computer system having a dedicated hardware structure, such as computer system 1900 shown in fig. 4, which is capable of performing various functions including functions such as those described previously, and the like, when various programs are installed. Fig. 4 is a block diagram of a computer system according to an exemplary embodiment of the present disclosure.
Computer system 1900 is intended to represent various forms of digital electronic computing devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the computer system 1900 includes a computing unit 1901, and the computing unit 1901 may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1902 or a computer program loaded from a storage unit 1908 into a Random Access Memory (RAM) 1903. In the RAM 1903, various programs and data required for the operation of the computer system 1900 may also be stored. The computing unit 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904. An input/output (I/O) interface 1905 is also connected to bus 1904.
Various components in computer system 1900 are connected to I/O interface 1905, including: an input unit 1906, an output unit 1907, a storage unit 1908, and a communication unit 1909. The input unit 1906 may be any type of device capable of inputting information to the computer system 1900, and the input unit 1906 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1907 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1908 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1909 allows the computer system 1900 to exchange information/data with other devices over a network, such as the internet, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1901 performs the various methods and processes described above. For example, in some embodiments, the above-described methods disclosed by embodiments of the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1908. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1900 via ROM 1902 and/or communication unit 1909. In some embodiments, the computing unit 1901 may be configured to perform the above-described methods of the disclosed embodiments by any other suitable means (e.g., by means of firmware).
The disclosed embodiments also provide a computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above-described method disclosed by the disclosed embodiments.
A computer readable storage medium in embodiments of the present disclosure may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium described above can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specifically, the computer-readable storage medium described above may include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The disclosed embodiments also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the above-described methods of the disclosed embodiments.
In an embodiment of the present disclosure, computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected to the user computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computers.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules, components or units referred to in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module, component or unit does not in some cases constitute a limitation of the module, component or unit itself.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The above description is merely illustrative of some embodiments of the present disclosure and of the principles of the technology applied. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (10)

1. A memory repair method, the method comprising:
obtaining a unit to be repaired in a memory;
acquiring repair time of the unit to be repaired, wherein the repair time comprises the memory operation or system restarting;
and based on the repair time, calling a target PPR module through a PPR interface to repair the unit to be repaired.
2. The method of claim 1, wherein the obtaining the unit to be repaired in the memory comprises:
obtaining error log data of a server cluster;
and predicting a unit to be repaired in the memory based on the error log data.
3. The method of claim 1, wherein the obtaining the unit to be repaired in the memory comprises:
acquiring fault information in the memory;
and determining a fault unit in the memory based on the fault information, and taking the fault unit as the unit to be repaired.
4. The method according to claim 1, wherein the obtaining the repair opportunity of the unit to be repaired includes:
generating user prompt information of whether the unit to be repaired needs to be repaired in the memory operation process;
receiving a selection operation aiming at the user prompt information, and determining whether the unit to be repaired needs to be repaired when the memory runs or not based on the selection operation; and when the selection operation is that the unit to be repaired is not needed to be repaired in the memory operation process, repairing the unit to be repaired in the system restarting process.
5. The method according to any one of claims 1 to 4, further comprising:
creating a target interface of the target PPR module; the target interface comprises type parameters, state parameters and address parameters of the unit to be repaired.
6. The method of claim 1, wherein, in the case where the repair opportunity is the memory operation, the invoking the target PPR module through the PPR interface to repair the unit to be repaired includes:
acquiring address information of the unit to be repaired;
performing isolation processing on the unit to be repaired based on the address information, and recording fault information of the unit to be repaired;
and under the condition that the server where the memory exists is detected to restart, a target PPR module is called through a PPR interface based on the fault information and the address information to repair the unit to be repaired.
7. The method of claim 1, wherein, in the case where the repair opportunity is a system restart, the invoking the target PPR module through the PPR interface to repair the unit to be repaired includes:
acquiring address information of the unit to be repaired and fault information of the unit to be repaired;
and under the condition that the server where the memory exists is detected to restart, a target PPR module is called through a PPR interface to repair the unit to be repaired based on the fault information and the address information.
8. A memory repair device, the device comprising:
the unit to be repaired obtaining module is used for obtaining the unit to be repaired in the memory;
the repair time acquisition module is used for acquiring repair time of the unit to be repaired, wherein the repair time comprises the memory operation or system restarting;
and the repair module is used for calling a target PPR module through a PPR interface to repair the unit to be repaired based on the repair time.
9. An electronic device, comprising:
at least one processor;
a memory for storing the at least one processor-executable instruction;
wherein the at least one processor is configured to execute the instructions to implement the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-7.
CN202311297629.2A 2023-10-09 2023-10-09 Memory repair method and device, electronic equipment and storage medium Pending CN117312037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311297629.2A CN117312037A (en) 2023-10-09 2023-10-09 Memory repair method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311297629.2A CN117312037A (en) 2023-10-09 2023-10-09 Memory repair method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117312037A true CN117312037A (en) 2023-12-29

Family

ID=89236854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311297629.2A Pending CN117312037A (en) 2023-10-09 2023-10-09 Memory repair method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117312037A (en)

Similar Documents

Publication Publication Date Title
US11144416B2 (en) Device fault processing method, apparatus, and system
US10037238B2 (en) System and method for encoding exception conditions included at a remediation database
US20170005858A1 (en) Log processing method and client
CN103034575B (en) Collapse analytical approach and device
JP5198154B2 (en) Fault monitoring system, device, monitoring apparatus, and fault monitoring method
CN103049373B (en) A kind of localization method of collapse and device
CN110704228B (en) Solid state disk exception handling method and system
US20180173607A1 (en) Software Defect Detection Tool
US20050033952A1 (en) Dynamic scheduling of diagnostic tests to be performed during a system boot process
US9772892B2 (en) Recovery method for portable touch-control device and portable touch-control device using the same
CN111190761B (en) Log output method and device, storage medium and electronic equipment
CN117312037A (en) Memory repair method and device, electronic equipment and storage medium
CN112860502A (en) Fault simulation method, equipment, server and storage medium
CN113778763B (en) Intelligent switching method and system for three-way interface service faults
CN116089141A (en) Database fault repairing method and device, emergency library system equipment and storage medium
CN115391110A (en) Test method of storage device, terminal device and computer readable storage medium
US11036624B2 (en) Self healing software utilizing regression test fingerprints
CN108845932B (en) Unit testing method and device of network library, storage medium and terminal
CN116048863A (en) Memory fault processing method and device, electronic equipment and storage medium
CN112286797B (en) Service monitoring method and device, electronic equipment and storage medium
US20230134493A1 (en) Updating error policy
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry
CN116089155A (en) Fault processing method, computing device and computer storage medium
CN113986757A (en) Test method, test apparatus, electronic device, storage medium, and program product
CN115599637A (en) Memory overflow detection method, device, system, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination