CN113835923A - Reset system, data processing system and related equipment - Google Patents

Reset system, data processing system and related equipment Download PDF

Info

Publication number
CN113835923A
CN113835923A CN202010588804.3A CN202010588804A CN113835923A CN 113835923 A CN113835923 A CN 113835923A CN 202010588804 A CN202010588804 A CN 202010588804A CN 113835923 A CN113835923 A CN 113835923A
Authority
CN
China
Prior art keywords
storage unit
memory
reset
module
replacement information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010588804.3A
Other languages
Chinese (zh)
Inventor
刁阳彬
韩林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010588804.3A priority Critical patent/CN113835923A/en
Priority to PCT/CN2021/102029 priority patent/WO2021259351A1/en
Publication of CN113835923A publication Critical patent/CN113835923A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/52Protection of memory contents; Detection of errors in memory contents

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiment of the application discloses a reset system, a data processing system and related equipment. The reset system comprises a reset control circuit, a processor core and a first register, wherein the fault replacement information recorded by the first register comprises position information of a first storage unit, and the first storage unit is a storage unit with a fault when the storage unit in the memory is subjected to fault replacement. The reset control circuit responds to the acquired reset signal and sends a reset instruction to a second module, wherein the second module comprises a processor core and does not comprise the first module. The new concept of fault replacement information is provided, a first register specially recording the fault replacement information is additionally arranged, and after the reset operation is completed, data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory are not lost on the premise of using a fault replacement technology and a reset technology in the memory.

Description

Reset system, data processing system and related equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a reset system, a data processing system, and a related device.
Background
With the increase of the memory capacity and the increase of the memory rate, the memory failure rate is continuously increased. When a storage unit in the memory fails, if the failed storage unit is not processed in time, uncorrectable errors (UCE) such as system downtime are easily caused, and then hardware return is caused. At present, before the occurrence of the UCE in the memory, the probability of the UCE occurring in the memory can be reduced by performing the failure replacement processing on the failed storage unit in the memory. The failure replacement processing refers to writing data in a failed storage unit in the memory into a backup storage unit in the memory to realize isolation of the failed storage unit.
However, after the storage unit in the memory is replaced due to a failure, the data distribution in the memory is changed, so that if a system reset condition is encountered subsequently, the data in the memory cannot be accessed correctly after the reset, and the memory data is lost.
Therefore, how to realize that data in the memory is not lost on the premise of using a fault replacement technology and a reset technology in the memory becomes a problem to be solved urgently.
Disclosure of Invention
The application provides a reset system, a data processing system and related equipment, which provide a new concept of fault replacement information, and add a first register specially recording the fault replacement information, so as to ensure that the fault replacement information is not lost in the reset process, so that after the reset operation is completed, data in a memory can be correctly accessed according to the fault replacement information, and the data in the memory is not lost on the premise of using a fault replacement technology and a reset technology in the memory.
In order to solve the technical problem, the application provides the following technical scheme:
in a first aspect, the present application provides a reset system, which may be used in the field of managing memory data. The reset system includes a reset control circuit, a processor core, and a first module. The first module comprises a first register, the first register is used for storing fault replacement information, one fault replacement information comprises position information of a first storage unit corresponding to one fault replacement operation, the first storage unit is a storage unit which has a fault when the fault replacement is carried out on the storage unit in the memory, namely the first storage unit is a storage unit which is replaced when the fault replacement is carried out on the storage unit in the memory; the first register may specifically be a status register or a configuration register. The reset control circuit is used for acquiring a thermal reset signal and responding to the acquired thermal reset signal to send a reset instruction to a second module, wherein the second module comprises a processor core and does not comprise the first module; that is, the reset control circuit sends the reset instruction to the processor core, and does not send the reset instruction to the first module. The reset instruction is used for triggering execution of a reset operation, so that the fault replacement information in the first register is not cleared after the reset operation is completed. Reset refers to restoring the state of the reset module/unit/device to a state of first power-on. The hot reset signal is used to trigger a hot reset operation. The reset instruction can be a group of low-level signals, and the group of low-level signals comprises at least one low-level signal; the reset command may also be a set of electrical signals that includes both a low level signal and a high level signal. In the implementation mode, a new concept of fault replacement information is provided, and a first register special for storing the fault replacement information is additionally arranged in a reset system; in the process of executing the reset operation, the first module is controlled not to be reset, so that after the reset operation is completed, the fault replacement information in the first register can not be reset, even if part of storage units in the memory are isolated and replaced due to fault replacement processing on the fault storage units in the memory, after the system is reset, which storage units in the memory are isolated fault storage units can be known according to the fault replacement information, so that system downtime caused by accessing the isolated fault storage units is avoided, namely, correct access to the memory can be realized, and data in the memory is not lost on the premise that a fault replacement technology and a reset technology in the memory are used.
In a possible implementation manner of the first aspect, the first module records one or more pieces of failure replacement information, and one piece of failure replacement information further includes location information of the second storage unit corresponding to one failure replacement operation. The second storage unit is a backup storage unit when the storage unit in the memory is subjected to failure replacement, that is, the second storage unit is a storage unit after the storage unit in the memory is subjected to failure replacement.
In this implementation manner, the fault replacement information at least includes the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation occurring in the memory is recorded by the fault replacement information, which storage units in the current memory are replaced and isolated due to the fault, and which storage unit the data after the fault replacement is stored in is also reflected, so that the distribution condition of the data in the memory in the storage units is visually reflected.
In a possible implementation manner of the first aspect, one piece of failure replacement information further includes a failure replacement type corresponding to one failure replacement operation, and the failure replacement type may be any one of the following types: memory bank replacement, memory plane replacement, memory granule replacement, memory block replacement, memory row replacement and memory storage cell replacement.
In a possible implementation of the first aspect, the granularity of the failed first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks. The memory storage unit cell is a storage unit with the minimum granularity in the memory, one memory line comprises a line of memory storage unit cells, one memory line comprises a plurality of memory storage unit cells, one memory block comprises a plurality of memory lines, one memory grain comprises a plurality of memory lines, one memory plane comprises a plurality of memory grains, and one memory bank comprises one or two memory planes. In this implementation manner, the granularity of the storage unit in the memory may be any one of a memory storage cell, a memory row, a memory block, a memory granule, a memory plane, or a memory bank, that is, the fault replacement information may reflect the fault replacement operation of any one of the foregoing granularities, that is, the scheme supports the fault replacement operation of any one of the granularities, and the implementation flexibility of the scheme is improved.
In a possible implementation manner of the first aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is configured to store data in the first storage unit of the memory storage cell when the at least one first storage unit is a failed storage unit of the memory storage cell. Further, the granularity of the second storage units configured in the first module may be memory storage unit cells, memory lines, or other granularities, and the number of the second storage units in the first module may be 32, 64, or 128. In the embodiment of the application, when the first storage unit with the granularity of the memory storage unit cell exists in the at least one first storage unit, in the process of fault processing of the fault memory storage unit cell, data in the fault memory storage unit cell can be written into the backup storage unit in the first module, and because the reset instruction is not sent to the first module in the reset process, the data in the backup storage unit can not be cleared after the reset operation is completed, so that the integrity of the data is ensured.
In one possible implementation of the first aspect, the system includes a memory controller, and the first module is integrated in the memory controller. The reset control circuit is specifically configured to send a reset instruction to the processor core and not send the reset instruction to the memory controller. In this implementation, since the fault replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory, the first module is integrated in the memory controller, which is convenient for the memory controller to manage the first module and also convenient for the memory controller to read the fault replacement information to manage the memory; in addition, the whole memory controller is directly controlled not to be reset, and the problem of asynchronism among different modules in the memory controller after reset is avoided.
In a possible implementation manner of the first aspect, the reset control circuit is further configured to send a reset instruction to the processor core and the first module in a case where a cold reset signal is acquired. The cold reset signal is used to trigger a cold reset operation, which means that the entire reset system and the memory need to be restored to a first power-on state, and generally can be performed by powering on or powering off. In this implementation manner, under the condition that the cold reset signal is acquired by the reset control circuit, it is proved that the reason for triggering the reset operation is that the memory has a fault, the memory needs to be reset at this time, that is, the data in the memory is cleared, so that there is no need to ensure that the data in the memory is not lost, the reset operation is also performed on the first module, and after the reset operation is completed, new fault replacement information can be written into the first module again, so that the whole reset system is ensured to be in a synchronous state.
In a possible implementation manner of the first aspect, the reset control circuit may include a logic circuit, and when the reset control circuit acquires the thermal reset signal, an output terminal of the reset control circuit is not coupled to the first module; when the reset control circuit acquires a cold reset signal, the output end of the reset control circuit is coupled with the first module.
In a possible implementation manner of the first aspect, the reset control circuit is further configured to send a first instruction to the first module, where the first instruction instructs the first module not to perform the reset operation.
In a second aspect, the present application provides a data processing system, which may be used in the field of managing memory data. The data processing system comprises a processor core and a first module, wherein the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit with a fault when the storage unit in the memory is subjected to fault replacement. And the processor core is used for acquiring the fault replacement information from the first register and writing the fault replacement information into the nonvolatile storage medium, so that the fault replacement information is not lost when the processor core and the first module perform reset operation. In the implementation mode, a new concept of fault replacement information is provided, and a first register specially used for storing the fault replacement information is added in the reset system, after the memory controller writes the fault replacement information into the first module, the processor core writes the newly generated fault replacement information into the nonvolatile storage medium, so that the reset of the data processing system does not cause the loss of the fault replacement information, even if partial storage units in the memory are isolated and replaced due to the fault replacement processing of the fault storage units in the memory, after the system is reset, which storage units in the memory are isolated fault storage units can be known according to the fault replacement information, so as to avoid the system downtime caused by accessing the isolated fault storage units, namely, the memory can be correctly accessed, so as to realize the fault replacement technology and the reset technology in the memory, the data in the memory is not lost.
In a possible implementation manner of the second aspect, the failure replacement information further includes location information of a second storage unit, where the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
In one possible implementation of the second aspect, the granularity of the failed storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
In one possible implementation of the second aspect, the system includes a memory controller, and the first module is integrated in the memory controller.
In a possible implementation manner of the second aspect, the processor core is further configured to, in a case that the reset operation is a hot reset operation, obtain the failure replacement information set from the non-volatile storage medium, and during the reset of the first register, backfill the failure replacement information set into the first register. Wherein the set of fault replacement information comprises at least one fault replacement information. In the implementation mode, the processor core acquires the fault replacement information set from the nonvolatile storage medium, and directly backfills the fault replacement information to the first module in the reset process of the first module so as to realize that the memory controller directly utilizes the fault replacement information in the first module to accurately access the memory after the data processing system is reset, so that the operation is simple and the implementation is easy.
In a possible implementation manner of the second aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is configured to store the first data in the first storage unit which is a memory storage cell if the at least one first storage unit is the memory storage cell. And the processor core is also used for acquiring the first data from the second storage unit in the first module and writing the first data into the nonvolatile storage medium, so that the first data is not lost when the processor core and the first module perform reset operation. The processor core is further used for acquiring a fault replacement information set and first data from the nonvolatile storage medium under the condition that the reset operation is a hot reset operation, backfilling the fault replacement information set to a first register and backfilling the first data to a second storage unit in the first module in the reset process of the first module, wherein the fault replacement information set comprises at least one piece of fault replacement information. In this implementation manner, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium, and when the first module is reset, the first data is refilled into the first module, so that the first data is prevented from being lost, and the integrity of the data is ensured.
In a possible implementation manner of the second aspect, the processor core is further configured to perform a reset operation on the first module to initialize the first module, acquire the failure replacement information set from the nonvolatile storage medium in a case that the reset operation is a hot reset operation, and perform a reverse replacement operation on data in the storage unit of the memory according to the failure replacement information set. The reverse replacement operation is used for rewriting the data in the second storage unit into the first storage unit so as to restore the distribution condition of the data in the memory in the storage units to an initial state. Further, restoring the distribution of the data in the memory in the storage unit to the initial state does not mean to clear the data in the memory, but means to store the data in the memory according to the storage mode before the fault replacement technology is executed.
In this implementation, since a failure of the processor core or a failure of the memory controller may also cause a certain storage unit in the memory to satisfy the failure replacement condition, that is, after the processor core and the memory controller are reset, the storage unit in the memory that satisfies the failure replacement condition may become a usable storage unit again, so that after the processor core and the memory controller are reset, a reverse replacement operation is performed on data in the storage unit of the memory, that is, a backup storage unit is released, which is beneficial to prolonging the service life of the memory.
In a possible implementation manner of the second aspect, the processor core is further configured to not acquire the failure replacement information set from the non-volatile storage medium if the reset operation is a cold reset operation.
In one possible implementation of the second aspect, the processor core is further configured to not obtain the failure replacement information set and the first data from the non-volatile storage medium if the reset operation is a cold reset operation.
For the concepts of the terms in the second aspect and the possible implementation manners of the second aspect, specific implementation steps, and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in various possible implementation manners in the first aspect, and details are not described here any more.
In a third aspect, the present application provides a reset method, which may be used in the field of managing memory data. The method is applied to a reset system, the system comprises a reset control circuit, a processor core and a first module, the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit with a fault when the storage unit in a memory is subjected to fault replacement. The reset control circuit acquires a thermal reset signal; the reset control circuit responds to the acquired thermal reset signal and sends a reset instruction to the second module, the second module comprises a processor core and does not comprise the first module, and the reset instruction is used for triggering execution of reset operation.
For specific implementation steps of the third aspect and various possible implementation manners of the third aspect and the third aspect of the present application, and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in various possible implementation manners of the first aspect, and details are not repeated here.
In a fourth aspect, the present application provides a data processing method, which may be used in the field of managing memory data. The method is applied to a data processing system, the data processing system comprises a processor core and a first module, the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit which has faults when the storage unit in a memory is subjected to fault replacement. The processor core acquires the fault replacement information from the first register; the processor core writes the fault replacement information into the non-volatile storage medium so that the fault replacement information is not lost when the processor core and the first module perform a reset operation.
For specific implementation steps of the fourth aspect and various possible implementation manners of the fourth aspect and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in various possible implementation manners of the second aspect, and details are not repeated here.
In a fifth aspect, the present application provides a computer device, wherein the reset system of the first aspect is configured in the computer device, or the data processing system of the second aspect is configured in the computer device.
In a sixth aspect, the present application provides a chip system comprising a processor for enabling the functionality referred to in the above aspects, e.g. to send or process data and/or information referred to in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the server or the communication device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
Drawings
Fig. 1 is a schematic structural diagram of a resetting system according to an embodiment of the present application;
fig. 2 is a schematic workflow diagram of a resetting system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a fault replacement technique in a reset method according to an embodiment of the present application;
FIG. 4 is a system diagram of a reset system provided in an embodiment of the present application;
FIG. 5 is a schematic workflow diagram of a data processing system according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating an inverse replacement operation in the data processing method according to the embodiment of the present application;
FIG. 7 is a system diagram of a reset system provided in an embodiment of the present application;
FIG. 8 is another system diagram of a reset system provided in accordance with an embodiment of the present application;
FIG. 9 is a system diagram of a data processing system according to an embodiment of the present application;
FIG. 10 is another system diagram of a data processing system according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a computer device in accordance with an embodiment of the present invention.
Detailed Description
The embodiment of the application provides a reset system, a data processing system and related equipment, provides a new concept of fault replacement information, and adds a first register specially used for recording the fault replacement information, so that the fault replacement information is ensured not to be lost in the reset process, and after the reset operation is completed, data in a memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost on the premise of using a fault replacement technology and a reset technology in the memory.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The reset system provided by the embodiment of the application is mainly applied to equipment for processing the memory data. To facilitate understanding of the present disclosure, in the embodiment of the present disclosure, first, a reset system provided in the embodiment of the present disclosure is introduced with reference to fig. 1, please refer to fig. 1 first, where fig. 1 is a schematic structural diagram of a reset system provided in the embodiment of the present disclosure, and fig. 1 may also be regarded as a schematic structural diagram of a data processing system provided in the embodiment of the present disclosure. The reset system includes a processor and a memory, and the processor and the memory can be configured in any type of electronic device. The processor is integrated with a processor core (core), a reset control circuit, a memory controller (DDRC) and a high-speed physical interface transceiver (HSPHY).
The processor core may be loaded with a software system for providing basic functions of the operating system. The reset control circuit is used for triggering a module or a unit in the processor to execute reset operation and triggering the memory to execute reset operation.
The memory controller is used for realizing the conversion from the address in the access request sent by the processor core to the physical address in the memory, transmitting the access request to the HSPHY, efficiently scheduling the access request sent by the processor core and performing fault replacement operation on a storage unit in the memory.
The HSPHY is in communication connection with a memory outside the processor and is used for acquiring digital signals generated by the memory controller, converting the digital signals into electric signals and transmitting the electric signals to the memory; and the memory controller is also used for acquiring the electric signal generated by the memory, converting the electric signal into a digital signal and transmitting the digital signal to the memory controller.
It should be noted that in an actual application product, the processor may further include more or fewer modules or units, and in addition, the memory controller may not be integrated in the processor, that is, the memory controller and the processor are two independent devices, and fig. 1 is only an example provided for facilitating understanding of an application environment of the present solution, and is not limited to the present solution.
Based on the above description. The embodiment of the application provides a reset system, wherein a first module used for storing fault replacement information is newly added in the reset system, and the fault replacement information is used for reflecting the distribution condition of data in a memory, so that the fault replacement information is not lost as long as the reset system is reset, and the data in the memory can still be correctly accessed according to the fault replacement information after the system is reset, so that the data in the memory is not lost on the premise of using a fault replacement technology and a reset technology in the memory. Specifically, in one implementation manner, during the reset process of the reset system, the first module is controlled not to perform the reset operation, so as to avoid the loss of the fault replacement information; in another implementation, the fault replacement information in the first module is written into a non-volatile storage medium outside the reset system, so that the reset of the reset system does not result in the loss of the fault replacement information, but the specific operation modes of the two cases are different greatly, which will be described separately below.
One, not clear the fault replacement information
In an embodiment of the present application, please refer to fig. 2, where fig. 2 is a schematic diagram of a workflow of a resetting system provided in an embodiment of the present application, and the workflow of the resetting system provided in the embodiment of the present application may include:
201. the processor core sends a failover instruction to the memory controller.
In the embodiment of the application, in the operation process of the reset system, the processor core can acquire whether the memory fails in real time, when a certain memory cell (cell) in the memory fails, the memory controller corrects the data in the memory cell according to an Error Correction and Correction (ECC) algorithm, and if the data in the memory cell fails to be corrected, the current failure is a Correctable Error (CE) error and then description information corresponding to the CE error is generated and recorded.
The memory may include one or more memory banks, one memory bank may include one or two memory planes (rank), one memory plane may include a plurality of memory granules (device), one memory granule may include a plurality of memory blocks (bank), one memory block may include a plurality of memory lines (row), and one memory line may include a plurality of memory storage cells (cell).
The description information includes at least location information of the occurrence of the CE error. The location information of the CE error is used to indicate the location of the storage unit in the memory where the CE error occurs, that is, the location information in the description information may indicate which storage unit in the memory the CE error occurs. The memory unit in the memory in the embodiment of the present application may specifically refer to one or more of the following: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks. Optionally, the description information may further include a CE error type. The CE error types include, but are not limited to, a CE error generated when the processor core accesses the memory, a CE error generated when the memory is periodically checked, or other types of CE errors, and the like, and are not exhaustive here. In the embodiment of the present application, the granularity of the storage unit may be any one of a memory storage cell, a memory row, a memory block, a memory granule, a memory plane, or a memory bank, that is, the fault replacement information may reflect the fault replacement operation of any one of the foregoing granularities, that is, the scheme supports the fault replacement operation of any one of the granularities, and the implementation flexibility of the scheme is improved.
After the processor core acquires the description information corresponding to the CE error, the processor core may determine a specific location of the CE error in the memory, and further determine whether the storage unit in the memory meets the failure replacement condition. And the processor core sends a fault replacement instruction to the memory controller under the condition that the fault replacement condition is determined to be met, and can continuously monitor the memory under the condition that the fault replacement condition is not met.
The fault replacement can be divided into a plurality of types according to different granularities: memory bank replacement, memory plane replacement, memory granule replacement, memory block replacement, memory row replacement and memory storage cell replacement.
Correspondingly, the fault replacement condition may include a memory bank fault replacement condition, a memory plane fault replacement condition, a memory granule fault replacement condition, a memory block fault replacement condition, a memory row fault replacement condition, and a memory storage cell fault replacement condition. Further, the memory bank fault replacement condition may specifically be that the number of times that the same memory bank has CE errors is greater than or equal to a first preset threshold, or the memory bank fault replacement condition may specifically be that the number of times that the same memory bank has CE errors of the same type is greater than or equal to a second preset threshold, and the like, and the memory bank fault replacement condition may also be other conditions, and values of the first preset threshold and the second preset threshold may both be flexibly set with reference to an actual situation, which is not limited herein. The meanings of the memory plane replacement condition, the specific memory granule replacement condition, the memory block failure replacement condition, the memory row failure replacement condition, and the memory storage cell failure replacement condition are similar to those of the memory bank failure replacement condition, and can be understood by referring to the foregoing description, which is not described herein again.
The fault replacement instruction at least carries the position information of the fault storage unit and the position information of the backup storage unit. The location information of the failed storage unit can be specifically represented as a character string, wherein the character string is the code of the replaced storage unit; the location information of the backup memory cell may be represented as a character string, which is a code of the memory cell after replacement. Optionally, the fault replacement instruction may further carry a fault replacement type, where the fault replacement type may also be expressed as a character code, for example, 00 represents that the fault replacement type is memory block replacement, 01 represents that the fault replacement type is memory plane replacement, and the like, which are not exhaustive here.
202. And the memory controller carries out fault replacement processing on the storage unit in the memory according to the received fault replacement instruction and writes fault replacement information into the first register.
In the embodiment of the application, the memory controller, after receiving the fault replacement instruction, may know which storage unit in the memory needs to be replaced and isolated according to the location information of the fault storage unit and the location information of the backup storage unit, and obtain the location of the backup storage unit after replacement. And further carrying out fault replacement processing on the storage units in the memory, and reading and writing the data in the fault storage units into the backup storage unit by the memory controller in one fault replacement operation. The memory controller may be integrated into the processor, or may be an independent device from the processor. Optionally, in a failover operation, the memory controller further needs to perform data reorganization on the storage units in the memory.
It should be noted that the failed storage unit is necessarily located in the memory, but the backup storage unit is not necessarily located in the memory. When the granularity of the failed memory cell is the memory storage cell, the backup memory cell for storing the data in the failed memory storage cell may be integrated in the first module. That is, when the granularity of a certain faulty memory cell is a memory storage cell, the memory controller writes the data in the faulty memory storage cell into the backup memory cell in the first module. When the granularity of the failed storage unit is a memory row, a memory block, a memory granule, a memory plane, or a memory bank, the corresponding backup storage unit may be disposed in the memory.
To more intuitively understand the processing procedure of the fault replacement technique, please refer to fig. 3, and fig. 3 is a schematic diagram of the fault replacement technique in the reset method according to the embodiment of the present application. In fig. 3, the memory cell that needs to be replaced with a failure is taken as an example of a memory granule. Fig. 3 includes three sub-diagrams (a), (B), and (c), where the sub-diagram (a) represents a data distribution in a memory bank before performing a fault replacement operation, and as shown in the sub-diagram (a), a memory bank includes two memory planes (Rank a and Rank B, respectively), each memory plane includes 18 memory granules, the 18 memory granules include 16 granules for normally storing data, and further include an ECC granule and a parity granule, and the ECC granule may also be regarded as a backup granule, and when a fault granule satisfying a fault replacement condition occurs from granule 0 to granule 15, data in the fault granule is written into the ECC granule to achieve replacement isolation of the fault granule. (b) The sub-diagram indicates that the particle 1 in the Rank a is a failure particle, and data of the particle 1 in the Rank a needs to be written into the ECC error correction particle of the Rank a, but the Rank a loses the error correction capability, and in order to enable the Rank a to still have the error correction capability after the particle replacement, the Rank a and the Rank B may be subjected to data reorganization so that the Rank a and the Rank B share one ECC error correction particle of the Rank B, that is, two 16+2 storage modes in the (a) sub-diagram are changed into a 32+3 mode in the (c) sub-diagram, it should be understood that the example in fig. 3 is only for convenience of understanding the failure replacement technology, and is not used for limiting the present scheme.
After the memory unit in the memory is subjected to the failure replacement processing, the memory controller writes failure replacement information into the first register. Wherein, there can be a plurality of fault replacement operations in the operation process of the reset system, and one fault replacement information is used for recording one fault replacement operation. The failure replacement information includes location information of a first storage unit and location information of a second storage unit, the first storage unit is a storage unit in which a CE error exists when a storage unit in the memory is replaced with a failure (i.e., a storage unit that is replaced after the failure replacement is performed), and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure (i.e., a storage unit that is used after the failure replacement is performed). Therefore, the fault replacement information can reflect the distribution condition of the data in the memory in the storage unit after the fault replacement processing is carried out on the storage unit with the fault in the memory.
Optionally, a failure replacement information may further include a granularity level of the failed storage unit, a CE error type of the replaced failed storage unit, or other types of information. In the embodiment of the application, the fault replacement information at least includes the position information of the replaced storage unit and the position information of the replaced storage unit, that is, the fault replacement operation occurring in the memory is recorded through the fault replacement information, which storage units in the current memory are replaced and isolated due to faults, and which storage unit the data after the fault replacement is stored in is also reflected, so that the distribution condition of the data in the memory in the storage units is intuitively reflected.
To further understand the concept of the failure replacement information, the following takes the granularity level of the storage unit performing the failure replacement as a memory block as an example, and further introduces the failure replacement information through table 1.
TABLE 1
Figure BDA0002555632720000091
The region0 in table 1 indicates that the replacement operation corresponding to the replacement information occurs in the region (region) with the number 0 in the memory. region0-enable is a field in the failover information that indicates whether failover has been performed in region0, and encoding 0 in region0-enable indicates failover has been performed in region 0. region0-size is a field in the failover information indicating the granularity of the storage unit in region0 for failover, and region0-size encodes 00 indicating the granularity of the storage unit in region0 for failover to be a bank (bank). The region0-rank indicates the number of the memory surface where the memory block needing to be subjected to fault replacement in the region0 is located, the region0-device indicates the number of the memory granule where the memory block needing to be subjected to fault replacement in the region0 is located, the region0-bank indicates the number of the memory block needing to be subjected to fault replacement in the region0, and the region0-rank, the region0-device and the region0-bank collectively indicate the position of the first storage unit, as shown in table 1, the first storage unit is the memory block with the number of 14 in the memory granules with the number of 5 in the memory surface with the number of 10 in the region 0. The region0-buddy-rank indicates the number of the memory surface where the backup memory block is located, the region0-buddy-device indicates the number of the memory granule where the backup memory block is located, the region0-buddy-bank indicates the number of the memory block that needs to be replaced with a failure in the region0, the region0-buddy-rank, the region0-buddy-device, and the region0-buddy-bank collectively indicate the location of the second storage unit, as shown in table 1, the second storage unit is the memory block numbered 22 in the memory granules numbered 13 in the memory surface numbered 18 in the region 0. It should be understood that in practical situations, the failure replacement information may include more or less information, and the example in table 1 is only for convenience of understanding the concept of the failure replacement information and is not used to limit the present solution.
The first register is assigned to the first module, and the first register stores therein the failure replacement information. The first register may be embodied as a status register, a configuration register, or other types of registers, and the like, which is not limited herein. The first module may be integrated in a memory controller. Further, a first register stores a fault replacement message, and a plurality of first registers may be configured in the first module to record a plurality of fault replacement messages.
Optionally, the first module may further be configured with at least one second storage unit, and the second storage unit in the first module is configured to store data in the first storage unit of the memory storage cell when the at least one first storage unit is a failed storage unit of the memory storage cell. In the embodiment of the application, because the reset instruction is not sent to the first module in the reset process, the data in the backup storage unit can not be cleared after the reset operation is completed, and the integrity of the data is ensured.
Further, the granularity of the backup storage units configured in the first module may be memory storage unit cells, memory lines, or other granularities, and the number of the backup storage units in the first module may be 32, 64, 128, or other numbers.
Optionally, the reset system may further be configured with a second register, where the second register is used to record a state of the memory controller during the fault replacement operation, where the state may include that no fault replacement operation occurs, that the fault replacement operation is in progress, that the fault replacement operation is successful, that the fault replacement operation fails, or other types of states, and the like, which is not limited herein. Further, the second register may also be integrated in the memory controller. Further, one or more sets of registers may be configured in the reset system, each set of registers including a first register and a second register.
203. The reset control circuit acquires a reset signal.
In the embodiment of the application, a reset operation may need to be performed in the operation process of the reset system, so that the reset control circuit can acquire the reset signal. The reset refers to a state of restoring the reset module/unit/device to a first power-on state. The reset control circuit may be integrated within the processor. Alternatively, the reset control circuit may determine whether a hot reset signal or a cold reset signal is received after receiving the reset signal. The cold reset signal is generally caused by a memory fault and is used for triggering a cold reset operation, and the cold reset operation refers to a state that the whole reset system and the memory need to be restored to a first power-on state, and generally can be performed by powering on or powering off. The thermal reset signal is generally caused by a non-memory fault and is used for triggering a thermal reset operation, and the thermal reset operation refers to that a part of the modules/units/devices are not reset in the reset process of the reset system.
Specifically, in an implementation manner, the reset control circuit may include a first pin and a second pin, and if the reset control circuit is a reset signal acquired from the first pin, the reset control circuit acquires a cold reset signal; if the reset control circuit is the reset signal acquired from the second pin, the reset control circuit acquires a thermal reset signal. The reset signal in this implementation may be represented as a set of low level signals, which may include one or more low level signals. In another implementation, the reset control circuit obtains a cold reset signal and a hot reset signal from the same signal source, respectively, and the cold reset signal and the hot reset signal are embodied as different electrical signals, for example, the cold reset signal is embodied as a 01 signal, or a 0101 signal, or a 0011 signal, and the hot reset signal is embodied as a 10 signal, or 1010, or 1100, etc., where "0" refers to a low level signal and "1" refers to a high level signal. Therefore, the reset control circuit can determine whether the signal is a cold reset signal or a hot reset signal according to the form of the received electric signal, and it should be understood that the example of the cold reset signal and the hot reset signal is only for convenience in understanding the scheme and is not used for limiting the scheme.
204. The reset control circuit sends a reset instruction to a second module, wherein the second module comprises the processor core and does not comprise the first module.
In some embodiments of the present application, after acquiring the reset signal, the reset control circuit controls the processor core to perform the reset operation, and controls the first module not to perform the reset operation, and also controls the memory not to perform the reset operation in response to the acquired reset signal. That is, the reset control circuit sends a reset instruction to the second module, which includes the processor core and does not include the first module. Optionally, the second module may further include other modules in the reset system besides the first module, as long as it is ensured that the first module and the memory do not perform the reset operation. It should be noted that the concept of the second module may be a human being divided concept module.
Specifically, the reset control circuit responds to the acquired reset signal, sends a reset instruction to the processor core, and does not send the reset instruction to the first module and the memory, where the reset instruction is used to trigger execution of a reset operation, so as to control the processor core to execute the reset operation, and control the first module not to execute the reset operation, so that data stored in the first module is not reset, that is, the data stored in the first module is not cleared. The reset instruction can be a group of low-level signals, and the group of low-level signals comprises at least one low-level signal; the reset command may also be a set of electrical signals including both a low level signal and a high level signal, and the like, which is not limited herein.
Further, the reset control circuit sends a reset instruction to the processor core and does not send the reset instruction to the first module and the memory when the acquired reset signal is a hot reset signal. And sending a reset instruction to the processor core, the first module and the memory under the condition of acquiring the cold reset signal. That is, only under the condition that the reset control circuit obtains the thermal reset signal, the first module and the memory are controlled not to execute the reset operation.
Optionally, the reset control circuit may further send a first instruction to the first module and the memory, respectively, where the first instruction indicates that the reset operation is not to be performed. Therefore, the processor core executes the reset operation after receiving the reset instruction, and the first module and the memory do not execute the reset operation after receiving the first instruction, so that the processor core is controlled to execute the reset operation, and the first module and the memory are controlled not to execute the reset operation.
Specifically, the implementation manner that the reset control circuit sends the first instruction to the first module is provided. In one case, the reset instruction and the first instruction may be represented by two different electrical signals, so that the reset control circuit may send the reset instruction to the processor core and the first instruction to the first module by sending different electrical signals to the processor core and the first module. Correspondingly, the first module may determine whether the first instruction is received or not according to the type of the received electrical signal. As an example, the reset command is 111000, the first command is 000111, "0" refers to a low level signal, and "1" refers to a high level signal, for example. In another case, the first module may be provided with a third pin and a fourth pin, and if the reset control circuit wants to send a reset instruction to the first module, the reset control circuit sends an instruction to the third pin; correspondingly, if the first module is an instruction acquired through the third pin, the acquired instruction is regarded as a reset instruction. If the reset control circuit wants to send a first instruction to the first module, the reset control circuit sends an instruction to the fourth pin; correspondingly, if the first module is an instruction acquired through the fourth pin, the acquired instruction is regarded as the first instruction.
The implementation manner of sending the first instruction to the memory by the reset control circuit is similar to the implementation manner of sending the first instruction to the first module by the reset control circuit, and is not described here again.
Further, the reset control circuit sends a reset instruction to the processor core and sends a first instruction to the first module and the memory when the acquired reset signal is a hot reset signal. And the reset control circuit sends a reset instruction to the processor core, the first module and the memory under the condition that the acquired reset signal is a cold reset signal. That is, only under the condition that the reset control circuit obtains the thermal reset signal, the first module and the memory are controlled not to execute the reset operation.
Further optionally, if the first module is integrated in the memory controller, step 204 may include: after acquiring the reset signal, the reset control circuit sends a reset instruction to the processor core in response to the acquired reset signal, and does not send the reset instruction to the memory controller. That is, the reset control circuit sends a reset instruction to the second module, which includes the processor core and does not include the memory controller. The specific implementation manner of the reset control circuit controlling the processor core to execute the reset operation is the same as the above description, and the specific implementation manner of the reset control circuit controlling the memory controller not to execute the reset operation is similar to the above description, except that the execution object in the above description is the first module, and the execution object in this implementation manner is the whole memory controller, which is not described herein again. In the embodiment of the application, the fault replacement information recorded in the first module indicates the distribution condition of data in the memory in the storage unit, and the memory controller is used for managing the memory, and the first module is integrated in the memory controller, so that the memory controller can conveniently manage the first module and conveniently read the fault replacement information to manage the memory; in addition, the whole memory controller is directly controlled not to be reset, and the problem of asynchronism among different modules in the memory controller after reset is avoided.
205. The reset control circuit sends a reset instruction to the processor core and the first module.
In some embodiments of the present application, the reset control circuit sends a first instruction to the processor core, the first module, and the memory to control the processor core, the first module, and the memory to all execute the reset operation when determining that the obtained reset signal is the cold reset signal. Optionally, the reset control circuit may also send a reset instruction to other modules in the reset system.
Further, the reset control circuit may include a logic circuit, and when the reset control circuit acquires the hot reset signal, the output terminal of the reset control circuit is not coupled to the first module; when the reset control circuit acquires a cold reset signal, the output end of the reset control circuit is coupled with the first module.
In the embodiment of the application, under the condition that the cold reset signal is acquired by the reset control circuit, it is proved that the reason for triggering the reset operation is that the memory has a fault, the memory needs to be reset at this time, that is, data in the memory can be cleared, so that the requirement that the data in the memory is not lost is no longer guaranteed, the reset operation is also executed on the first module, and therefore, after the reset operation is completed, new fault replacement information can be written into the first module again, and the whole reset system is ensured to be in a synchronous state.
Optionally, the first module is integrated in a memory controller, and the reset control circuit controls the processor core and the memory controller to perform a reset operation when determining that the acquired reset signal is a cold reset signal, and the reset control circuit controls the memory to perform the reset operation. The specific implementation manner is similar to that described above, and the difference is that the first module in the above description is replaced by a memory controller, which is not described herein again.
To further understand the present disclosure, please refer to fig. 4, and fig. 4 is a system diagram of a reset system according to an embodiment of the present disclosure. Fig. 4 illustrates the first module integrated into a memory controller integrated into a processor. And under the condition that the reset control circuit acquires the cold reset signal, the reset control circuit sends a reset instruction to the processor core, the memory controller, the HSPHY and the memory so as to trigger the whole reset system and the memory to execute reset operation. Under the condition that the reset control circuit acquires the hot reset signal, the reset control circuit sends a reset instruction to the processor core, and does not send the reset instruction to the memory controller, the HSPHY, and the memory to control the first module not to execute the reset operation.
In the embodiment of the application, a new concept of fault replacement information is provided, and a first register special for storing the fault replacement information is additionally arranged in a reset system; in the process of executing the reset operation, the first module is controlled not to be reset, so that after the reset operation is completed, the fault replacement information in the first register can not be reset, even if part of storage units in the memory are isolated and replaced due to fault replacement processing on the fault storage units in the memory, after the system is reset, which storage units in the memory are isolated fault storage units can be known according to the fault replacement information, so that system downtime caused by accessing the isolated fault storage units is avoided, namely, correct access to the memory can be realized, and data in the memory is not lost on the premise that a fault replacement technology and a reset technology in the memory are used.
Second, backup failure replacement information
In an embodiment of the present application, please refer to fig. 5, where fig. 5 is a schematic workflow diagram of a data processing system according to an embodiment of the present application, and the workflow of the data processing system according to the embodiment of the present application may include:
501. the processor core sends a failover instruction to the memory controller.
502. And the memory controller carries out fault replacement processing on the storage unit in the memory according to the received fault replacement instruction and writes fault replacement information into the first register.
In the embodiment of the present application, the specific implementation manners of steps 501 and 502 are similar to the specific implementation manners of steps 201 and 202 in the corresponding embodiment of fig. 2, and reference may be made to the above description, which is not repeated herein.
503. The processor core writes the fault replacement information to the non-volatile storage medium.
In some embodiments of the present application, after the memory controller writes the fault replacement information to the first register in the first module, the processor core may read the fault replacement information from the first register in the first module and write the newly generated fault replacement information to the non-volatile storage medium. The concept of the first module and the failure replacement information has already been introduced in the embodiment corresponding to fig. 2, and is not described herein again. The nonvolatile storage medium may be a hard disk, a Complex Programmable Logic Device (CPLD), an Electrically Erasable Programmable Read Only Memory (EEPROM), or other types of nonvolatile storage media. The non-volatile storage medium may be configured in the same device as the processor core or may be configured in a different device from the processor core. The processor core and the non-volatile storage medium may be in data communication via an internal interface, including but not limited to a bus, or an external interface, including a wired communication interface and a wireless communication interface.
Specifically, the method is directed to a process of reading the fault replacement information from the first register by the processor core. The memory controller, after writing the failover information to the first register, signals the processor core to complete the failover technique, and the processor core, upon learning the completion signal, reads the failover information from the first module.
More specifically, referring to the description in step 201 of the corresponding embodiment in fig. 2, a second register is configured in the data processing system, after the memory controller writes the fault replacement information into the first module, information that the fault replacement operation is successful (i.e., a signal indicating that the fault replacement technique is completed) is written into the second register, and after the processor core reads the information in the second register, it is determined that the memory controller has completed the fault replacement operation, and the fault replacement information is copied from the first register.
504. And the processor core writes the first data stored in the second storage unit in the first module into the nonvolatile storage medium.
In some embodiments of the present application, the first module may further include at least one second storage unit, and the second storage unit in the first module is configured to store the first data in the first storage unit which is a memory storage cell if the at least one first storage unit is the memory storage cell.
When the granularity of a certain faulty storage unit is a memory storage cell, the memory controller may write first data in the faulty storage unit into a second storage unit (that is, a backup storage unit) in the first module, and after the memory controller writes the fault replacement information into a first register in the first module, the processor core may read the first data from the second storage unit included in the first module and write the first data into the nonvolatile storage medium, so that the first data is not lost when the processor core and the first module perform a reset operation.
Specifically, the method is directed to a process of reading first data from a backup storage unit in a first module for a processor core. After writing the fault replacement information into the first register, the memory controller will show a signal for completing the fault replacement technique to the processor core, and the processor core reads the first data from the backup storage unit in the first module after learning the completion signal. The specific implementation of the processor core determining that the memory controller has completed the fault replacement operation is described in step 503, and is not described herein.
It should be noted that step 504 is an optional step, and if there is no failed storage unit with the granularity of the memory storage unit cell, step 504 does not need to be executed. If step 504 is executed, the execution sequence between step 503 and step 504 is not limited in the embodiment of the present application, and step 503 may be executed first, and then step 504 may be executed; step 504 may be executed first, and then step 503 may be executed; steps 503 and 504 may also be performed simultaneously.
505. The reset control circuit acquires a reset signal.
In this embodiment of the application, a specific implementation manner of step 505 is similar to that of step 203 in the embodiment corresponding to fig. 2, and reference may be made to the above description, which is not repeated herein.
506. The reset control circuit sends a reset instruction to the processor core and the first module.
In some embodiments of the present application, after the reset control circuit acquires the reset signal, no matter whether the reset signal is a hot reset signal or a cold reset signal, the reset control circuit sends a reset instruction to the processor core and the first module to trigger the processor core and the first module to execute the reset operation. Further, if the obtained signal is a hot reset signal, the reset control circuit does not send a reset instruction to the memory to control the memory not to execute the reset operation; if the acquired cold reset signal is acquired, the reset control circuit sends a reset instruction to the memory to control the memory to execute reset operation. The representation of the reset instruction has already been introduced in the embodiment corresponding to fig. 2, and is not described herein again. It should be noted that, although the first module is integrated in the memory controller in fig. 5, in practical cases, the first module may also be disposed outside the memory controller, and is not limited herein.
Alternatively, if the first module is integrated in a memory controller integrated in a processor, the entire data processing system may be represented as one processor, and the reset control circuit may send a reset instruction to the entire processor after acquiring the reset signal, so as to control the entire processor to perform the reset operation.
507. The processor core determines whether the reset operation is a hot reset operation, and if the reset operation is a hot reset operation, the process proceeds to step 508, and if the reset operation is a cold reset operation, the process proceeds to step 510.
In some embodiments of the present application, a third register is further disposed in the reset control circuit, and the third register is configured to record whether the reset signal acquired by the reset control circuit this time is a cold reset signal or a hot reset signal. After receiving the reset instruction sent by the reset control circuit, the processor core queries the information recorded in the third register to determine whether the reset signal triggering the current reset operation is a hot reset signal, that is, whether the current reset operation is a hot reset operation.
508. The processor core performs a reset operation on the processor core and the first module.
In some embodiments of the present application, initialization software is run in the processor core, and in the case that the hot reset operation is determined, the initialization software in the processor core needs to perform the reset operation on the processor core and the first module. During reset boot, initialization software in the processor core obtains a set of fault replacement information from the non-volatile storage medium. Since more than one fault replacement operation may occur during the operation of the data processing system, and one fault replacement information is used to record the replacement information of the storage unit in one fault replacement operation, the failure replacement information set acquired by the processor core from the nonvolatile storage medium may be a set of one or more fault replacement information. The initialization software may be embodied as a Basic Input Output System (BIOS) system.
Optionally, if step 504 is executed, the initialization software in the processor core further obtains the first data from the non-volatile storage medium during the reset start process.
Specifically, a process for a processor core to perform a reset operation on a first module. In one implementation, initialization software in the processor core backfills the set of fault replacement information to the first register during a reset of the first register. In the embodiment of the application, the processor core acquires the fault replacement information set from the nonvolatile storage medium, and directly backfills the fault replacement information to the first module in the resetting process of the first module so as to realize that the memory controller directly utilizes the fault replacement information in the first module to accurately access the memory after the data processing system is reset, so that the operation is simple and the implementation is easy.
More specifically, after the initialization software in the processor core triggers the execution of the reset operation on the processor core and the first module, the acquired multiple pieces of fault replacement information are respectively refilled into the multiple first registers in the process that the initialization software in the processor core executes the reset operation on the first registers. Since the configuration register only supports hardware writing and the status register supports both hardware writing and software writing, the first register is embodied as a status register in this implementation.
Optionally, if step 504 is executed, the initialization software in the processor core backfills the set of fault replacement information to the first register and backfills the first data to the second storage unit in the first module during the reset operation performed on the first module. The implementation manner of the processor core backfilling the first data to the second storage unit in the first module is similar to the implementation manner of backfilling the fault replacement information to the first register, and details are not repeated here. In this implementation manner, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium, and when the first module is reset, the first data is refilled into the first module, so that the first data is prevented from being lost, and the integrity of the data is ensured.
In one implementation, initialization software in a processor core performs a reset operation on a first module to initialize the first module; and according to the failure replacement information set, performing inverse replacement operation on the data in the storage unit of the memory, wherein the inverse replacement operation is used for writing the data in the second storage unit into the first storage unit, so that the distribution condition of the data in the memory in the storage unit is restored to an initial state. The restoring of the distribution of the data in the memory in the storage unit to the initial state does not mean to clear the data in the memory, but means to store the data in the memory according to the storage mode before the fault replacement technology is executed. In the embodiment of the present application, since a failure of the processor core or a failure of the memory controller may also cause a certain storage unit in the memory to satisfy the failure replacement condition, that is, after the processor core and the memory controller are reset, the storage unit in the memory that satisfies the failure replacement condition may become a usable storage unit again, so that after the processor core and the memory controller are reset, a reverse replacement operation is performed on data in the storage unit of the memory, that is, a backup storage unit is released, which is beneficial to prolonging the service life of the memory.
More specifically, the initialization software in the processor core performs a reset operation on the first module, so that the set of fault replacement information recorded in the first module is cleared after the first module is initialized. Because each piece of failure replacement information records the replacement relationship between one first storage unit and one second storage unit, the initialization software in the processor core can acquire the position of the first storage unit and the position of the second storage unit according to the failure replacement information, and then rewrite the data stored in one second storage unit to the first storage unit, namely execute inverse replacement operation on the data in the storage units of the memory.
Furthermore, the granularity level of the failed storage unit is a memory granule, the initialization software in the processor core needs to check the data in the second storage unit by using the data in the parity granules, and if the data in the second storage unit is found to have an error, the data in the second storage unit is corrected by using the data in the ECC error correction granules, and then the data in the second storage unit after error correction is written into the first storage unit again.
Correspondingly, the inverse replacement operation also requires data reorganization of the data in the memory grain.
To further understand the present disclosure, please refer to fig. 6, where fig. 6 is a schematic diagram of an inverse replacement operation in the data processing method according to the embodiment of the present disclosure. By way of example with reference to fig. 3, fig. 6 includes two sub-diagrams (a) and (B), where the sub-diagram (a) represents a data distribution situation in a memory bank before performing an inverse replacement operation, and as shown in the sub-diagram (a), after performing a fail replacement operation, data of a Rank 1 in a Rank a is written into an ECC error correction granule of Rank a, Rank a and Rank B share an ECC error correction granule of Rank B, and then the inverse replacement operation is to rewrite the data in the ECC error correction granule of Rank a into the Rank 1 in Rank a. (b) The sub-diagram represents the data distribution in one memory bank after the reverse replacement operation is performed, after the processor core checks the data in the Rank a ECC error correction granule by using the Rank a parity bit granule, it is found that the data in the Rank a ECC error correction granule has no error, and then the data in the Rank a ECC error correction granule is read and written into the Rank a granule 1, and the processor core further performs data reorganization on the Rank a and the Rank B, that is, the data storage modes in the Rank a and the Rank B are changed back to two 16+2 storage modes, so that the distribution of the data in the memory cells is restored to the initial state.
It should be noted that, in the embodiment of the present application, the number of execution times between steps 501 to 504 and steps 505 to 508 is not limited, and steps 505 to 508 may be executed once after steps 501 to 504 are executed multiple times.
509. The processor core does not retrieve the set of fault replacement information from the non-volatile storage medium.
In some embodiments of the present application, when the processor core determines that the current reset operation is a cold reset operation, the processor core does not acquire the failure replacement information set from the nonvolatile storage medium, but directly performs the reset operation on the processor core, the first module, the memory controller, and the memory, that is, initializes the entire data processing system.
In the embodiment of the application, under the condition that the cold reset signal is acquired by the reset control circuit, it is proved that the reason for triggering the reset operation is that the memory has a fault, the memory needs to be reset at this time, that is, data in the memory can be cleared, so that the requirement for ensuring that the data in the memory is not lost is no longer met, a fault replacement information set is no longer acquired from a nonvolatile storage medium under the condition, the execution of a redundancy step is avoided, and the efficiency of the reset process is improved.
It should be noted that steps 507 and 509 are optional steps, and if steps 507 and 509 are not executed, step 508 may be directly executed after step 505 is executed.
In the embodiment of the application, a new concept of fault replacement information is provided, and a first register specially used for storing the fault replacement information is added in a reset system, after a memory controller writes the fault replacement information into a first module, a processor core writes newly generated fault replacement information into a nonvolatile storage medium, so that the reset of a data processing system does not cause the loss of the fault replacement information, even if partial storage units in a memory are isolated and replaced due to the fault replacement processing of fault storage units in the memory, after the system is reset, which storage units in the memory are isolated fault storage units can be known according to the fault replacement information, so that the system downtime caused by accessing the isolated fault storage units is avoided, namely the memory can be correctly accessed, so that on the premise that the fault replacement technology and the reset technology in the memory are used, the data in the memory is not lost.
On the basis of the embodiments corresponding to fig. 1 to 6, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme. Referring to fig. 7, fig. 7 is a system diagram of a resetting system according to an embodiment of the present disclosure. The reset system 700 includes a reset control circuit 701, a processor core 7021, and a first module 703. The first module 703 includes a first register, where the first register is used to store failure replacement information, where the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when performing failure replacement on a storage unit in a memory; a reset control circuit 701 for acquiring a thermal reset signal; the reset control circuit 701 is further configured to send a reset instruction to the second module 702 in response to the obtained hot reset signal, where the second module 702 includes the processor core 7021 and does not include the first module 703, and the reset instruction is used to trigger execution of a reset operation.
In one possible design, the failure replacement information further includes location information of a second storage unit, where the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
In one possible design, the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
In one possible design, the first module 703 further includes at least one second storage unit, and the second storage unit in the first module 703 is used for storing data in the first storage unit which is a memory storage cell if the at least one first storage unit is the memory storage cell.
In one possible design, please refer to fig. 8, and fig. 8 is a system diagram of a resetting system according to an embodiment of the present disclosure. The reset system 700 includes a memory controller 704, and the first module 703 is integrated in the memory controller 704. The reset control circuit 701 is specifically configured to control the processor core 7021 to perform a reset operation, and control the memory controller 704 not to perform the reset operation.
In one possible design, the reset control circuit 701 is further configured to send a reset instruction to the processor core 7021 and the first module 703 in case of acquiring a cold reset signal.
In one possible design, the reset control circuit 701 is specifically configured to send a reset instruction to the processor core 7021, and send a first instruction to the first module 703, where the first instruction instructs the first module 703 not to perform a reset operation.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the reset system 700 are based on the same concept as the method embodiments corresponding to fig. 2 to fig. 4 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Fig. 9 is a schematic system diagram of a data processing system according to an embodiment of the present application, and fig. 9 is a schematic system diagram of the data processing system according to the embodiment of the present application. The data processing system 900 includes a processor core 901 and a first module 902, where the first module 902 includes a first register, and the first register is used to store failure replacement information, where the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when a storage unit in a memory is replaced with a failure. The processor core 901 is configured to obtain the failure replacement information from the first register; the processor core 901 is further configured to write the failure replacement information into the nonvolatile storage medium, so that the failure replacement information is not lost when the processor core 901 and the first module 902 perform the reset operation.
In one possible design, the failure replacement information further includes location information of a second storage unit, where the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
In one possible design, the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
In one possible design, please refer to fig. 10, in which fig. 10 is a system diagram of a data processing system according to an embodiment of the present disclosure. The system 900 includes a memory controller 903, and a first module 902 is integrated in the memory controller 903.
In one possible design, the processor core 901 is further configured to, in a case where the reset operation is a hot reset operation, obtain a set of fault replacement information from the non-volatile storage medium, and backfill the set of fault replacement information into the first register during the reset of the first register, where the set of fault replacement information includes at least one piece of fault replacement information.
In one possible design, the first module 902 further includes at least one second storage unit, and the second storage unit in the first module 902 is used to store the first data in the first storage unit of the memory storage unit cell in the case that the at least one first storage unit is the memory storage unit cell. The processor core 901 is further configured to obtain first data from a second storage unit in the first module 902, and write the first data into a nonvolatile storage medium, so that the first data is not lost when the processor core 901 and the first module 902 perform a reset operation; the processor core 901 is further configured to, if the reset operation is a hot reset operation, obtain a failure replacement information set and first data from the nonvolatile storage medium, during the reset process of the first module 902, refill the failure replacement information set into the first register, and refill the first data into a second storage unit in the first module 902, where the failure replacement information set includes at least one piece of failure replacement information.
In one possible design, the processor core 901 is further configured to perform a reset operation on the first module 902 to initialize the first module 902; the processor core 901 is further configured to, if the reset operation is a hot reset operation, obtain a failure replacement information set from the nonvolatile storage medium, and perform a reverse replacement operation on data in the storage unit of the memory according to the failure replacement information set, where the failure replacement information set includes at least one failure replacement information, and the reverse replacement operation is configured to rewrite the data in the second storage unit into the first storage unit, so as to restore the distribution of the data in the storage unit in the memory to an initial state.
In one possible design, processor core 901 is further configured to not retrieve the failure replacement information set from the non-volatile storage medium if the reset operation is a cold reset operation.
It should be noted that, the information interaction, the execution process, and other contents between the modules/units in the data processing system 900 are based on the same concept as that of the method embodiments corresponding to fig. 5 and fig. 6 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. The reset system 700 described in the embodiment corresponding to fig. 7 or fig. 8 may be disposed on the computer device 110, and is used to implement the functions of the reset system in the embodiments corresponding to fig. 2 to fig. 4. Alternatively, the computer device 110 may be disposed with the data processing system 900 described in the embodiment corresponding to fig. 9 or fig. 10, for implementing the functions of the data processing system in the embodiment corresponding to fig. 5 or fig. 6. Specifically, the computer device 110 includes: a wired or wireless network interface 1101, an input-output interface 1102, a processor 1103, and a non-volatile storage medium 1104 (wherein the number of processors 1103 in the computer device 110 may be one or more, and one processor is taken as an example in fig. 11). Among other things, the processor 1103 may include an application processor 11031 and a communication processor 11032. The memory 1104 may include a non-volatile storage medium 11041 and a memory 11042. In some embodiments of the present application, the wired or wireless network interface 1101, the input-output interface 1102, the processor 1103, and the non-volatile storage medium 1104 may be connected by a bus or other means.
The memory 11042 may include read-only memory and random access memory, and provides instructions and data to the processor 1103. A portion of the non-volatile storage medium 11041 may also include non-volatile random access memory (NVRAM). The non-volatile storage media 1104 stores the processor and the operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
The processor 1103 controls the operation of the computer device. In a particular application, the various components of a computer device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiments of the present application can be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in software form in the processor 1103. The processor 1103 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1103 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104 and performs the steps of the method in combination with the hardware.
The wired or wireless network interface 1101 is used to implement signal transmission and signal reception functions of the computer device 110. The input/output interface 1102 may be used to receive input numeric or character information and output numeric or character information; the input/output interface 1102 is further operable to send a command to the disk group via the first interface to modify data in the disk group; the input/output interface 1102 may also include a display device such as a display screen.
In this embodiment, in one case, the application processor 11031 is configured to implement the function of the reset system in the corresponding embodiment of fig. 2 to fig. 4. It should be noted that, for the specific implementation manner and the advantageous effects of the application processor 11031 executing the functions of the meter resetting system in the embodiments corresponding to fig. 2 to fig. 4, reference may be made to the descriptions in the method embodiments corresponding to fig. 2 to fig. 4, and details are not repeated here.
In this embodiment, the application processor 11031 is used to realize the functions of the data processing system in the corresponding embodiment of fig. 5 or fig. 6 in another case. It should be noted that, for the specific implementation manner and the beneficial effects brought by the application processor 11031 executing the functions of the data processing system in the embodiment corresponding to fig. 5 or fig. 6, reference may be made to the description in each method embodiment corresponding to fig. 5 or fig. 6, and details are not repeated here.
Also provided in an embodiment of the present application is a computer-readable storage medium having stored therein a program for generating a running speed of a vehicle, which, when running on a computer, causes the computer to execute steps executed by a reset system in a method as described in the aforementioned embodiment shown in fig. 2 to 4, or steps executed by a data processing system in a method as described in the aforementioned embodiment shown in fig. 5 or 6.
Embodiments of the present application further provide a computer program product, which when running on a computer, causes the computer to execute the steps executed by the reset system in the method described in the foregoing embodiments shown in fig. 2 to 4, or execute the steps executed by the data processing system in the method described in the foregoing embodiments shown in fig. 5 or fig. 6.
Further provided in embodiments of the present application is a circuit system, which includes a processing circuit configured to perform steps performed by a reset system in the method described in the foregoing embodiments shown in fig. 2 to 4, or perform steps performed by a data processing system in the method described in the foregoing embodiments shown in fig. 5 or 6.
The reset system or the data processing system provided by the embodiment of the present application may specifically be a chip, and the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to enable the chip to execute the reset method described in the embodiments shown in fig. 2 to 4 or the data processing method described in the embodiments shown in fig. 5 or fig. 6. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general hardware, and certainly can also be implemented by special hardware including application specific integrated circuits, special CLUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (29)

1. A reset system is characterized by comprising a reset control circuit, a processor core and a first module, wherein the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit which has a fault when the storage unit in a memory is subjected to fault replacement;
the reset control circuit is used for acquiring a thermal reset signal;
the reset control circuit is further configured to send a reset instruction to a second module in response to the acquired thermal reset signal, where the second module includes the processor core and does not include the first module, and the reset instruction is used to trigger execution of a reset operation.
2. The system according to claim 1, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
3. The system of claim 1, wherein the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
4. The system of claim 2, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is configured to store data in the first storage unit which is a memory storage cell if at least one first storage unit is the memory storage cell.
5. The system of any one of claims 1 to 4, wherein the system comprises a memory controller, the first module being integrated in the memory controller;
the reset control circuit is specifically configured to send a reset instruction to the processor core, and not send the reset instruction to the memory controller.
6. The system according to any one of claims 1 to 4,
the reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired.
7. A data processing system comprises a processor core and a first module, wherein the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit which has faults when the storage unit in a memory is subjected to fault replacement;
the processor core is used for acquiring the fault replacement information from the first register;
the processor core is further configured to write the failure replacement information into a non-volatile storage medium.
8. The system according to claim 7, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
9. The system of claim 7, wherein the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
10. The system of any one of claims 7 to 9, wherein the system comprises a memory controller, and wherein the first module is integrated into the memory controller.
11. The system according to any one of claims 7 to 9,
the processor core is further configured to, if the reset operation is a hot reset operation, obtain a failure replacement information set from the non-volatile storage medium, and backfill the failure replacement information set to the first register during the reset of the first register, where the failure replacement information set includes at least one piece of the failure replacement information.
12. The system of claim 8, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is configured to store the first data in the first storage unit which is a memory storage cell if the at least one first storage unit is the memory storage cell;
the processor core is further configured to obtain the first data from a second storage unit in the first module, and write the first data into the nonvolatile storage medium, so that the first data is not lost when the processor core and the first module perform a reset operation;
the processor core is further configured to, if a reset operation is a hot reset operation, obtain a failure replacement information set and the first data from the non-volatile storage medium, and during a reset process of the first module, backfill the failure replacement information set to the first register and backfill the first data to a second storage unit in the first module, where the failure replacement information set includes at least one piece of the failure replacement information.
13. The system of claim 8,
the processor core is further configured to perform a reset operation on the first module to initialize the first module;
the processor core is further configured to, if a reset operation is a hot reset operation, acquire the failure replacement information set from the nonvolatile storage medium, and perform a reverse replacement operation on data in the storage unit of the memory according to the failure replacement information set, where the failure replacement information set includes at least one piece of the failure replacement information, and the reverse replacement operation is configured to rewrite the data in the second storage unit into the first storage unit, so as to restore the distribution of the data in the storage unit in the memory to an initial state.
14. The system of claim 11,
the processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium if the reset operation is a cold reset operation.
15. A reset method is applied to a reset system, the system comprises a reset control circuit, a processor core and a first module, the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit which has a fault when the storage unit in a memory is subjected to fault replacement;
the reset control circuit acquires a thermal reset signal;
the reset control circuit responds to the acquired thermal reset signal and sends a reset instruction to a second module, the second module comprises the processor core and does not comprise the first module, and the reset instruction is used for triggering execution of reset operation.
16. The method according to claim 15, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
17. The method of claim 15, wherein the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
18. The method of claim 16, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used for storing data in the first storage unit which is a memory storage cell if at least one first storage unit is the memory storage cell.
19. The method of any one of claims 15 to 18, wherein the method comprises a memory controller, the first module being integrated into the memory controller;
the reset control circuit sends a reset instruction to the processor core and does not send a reset instruction to the first module, including:
the reset control circuit sends a reset instruction to the processor core and does not send a reset instruction to the memory controller.
20. The method of any one of claims 15 to 18, further comprising:
and the reset control circuit sends a reset instruction to the processor core and the first module under the condition of acquiring a cold reset signal.
21. A data processing method is applied to a data processing system, the data processing system comprises a processor core and a first module, the first module comprises a first register, the first register is used for storing fault replacement information, the fault replacement information comprises position information of a first storage unit, and the first storage unit is a storage unit which has a fault when the storage unit in a memory is subjected to fault replacement;
the processor core acquires the fault replacement information from the first register;
and the processor core writes the fault replacement information into a nonvolatile storage medium.
22. The method according to claim 21, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with the failure.
23. The method of claim 21, wherein the granularity of the first storage unit is any one of: memory storage unit cells, memory rows, memory blocks, memory granules, memory planes and memory banks.
24. The method of any of claims 21 to 23, wherein the system comprises a memory controller, and wherein the first module is integrated into the memory controller.
25. The method of any one of claims 21 to 23, further comprising:
and when the reset operation is a hot reset operation, the processor core acquires a fault replacement information set from the nonvolatile storage medium, and backfills the fault replacement information set to the first register in the reset process of the first module, wherein the fault replacement information comprises at least one piece of fault replacement information.
26. The method according to claim 22, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used for storing the first data in the first storage unit which is the memory storage unit cell if at least one first storage unit is the memory storage unit cell;
the method further comprises the following steps:
the processor core acquires the first data from a second storage unit in the first module and writes the first data into the nonvolatile storage medium, so that the first data is not lost when the processor core and the first module perform reset operation;
and in the case that the reset operation is a hot reset operation, the processor core acquires a failure replacement information set and the first data from the nonvolatile storage medium, backfills the failure replacement information set to the first register and backfills the first data to a second storage unit in the first module in the reset process of the first module, wherein the failure replacement information set comprises at least one piece of failure replacement information.
27. The method of claim 22, further comprising:
the processor core executes a reset operation on the first module to initialize the first module;
and under the condition that the reset operation is a hot reset operation, the processor core acquires the fault replacement information set from the nonvolatile storage medium and executes a reverse replacement operation on data in the storage unit of the memory according to the fault replacement information set, wherein the fault replacement information set comprises at least one piece of fault replacement information, and the reverse replacement operation is used for rewriting the data in the second storage unit into the first storage unit so as to restore the distribution condition of the data in the memory in the storage unit to an initial state.
28. The method of claim 25, further comprising:
in the event that the reset operation is a cold reset operation, the processor core does not retrieve the set of fault replacement information from the non-volatile storage medium.
29. A computer device, characterized in that the computer device is provided with a reset system according to any one of claims 1 to 6 or with a data processing system according to any one of claims 7 to 14.
CN202010588804.3A 2020-06-24 2020-06-24 Reset system, data processing system and related equipment Pending CN113835923A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010588804.3A CN113835923A (en) 2020-06-24 2020-06-24 Reset system, data processing system and related equipment
PCT/CN2021/102029 WO2021259351A1 (en) 2020-06-24 2021-06-24 Reset system, data processing system, and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010588804.3A CN113835923A (en) 2020-06-24 2020-06-24 Reset system, data processing system and related equipment

Publications (1)

Publication Number Publication Date
CN113835923A true CN113835923A (en) 2021-12-24

Family

ID=78964602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010588804.3A Pending CN113835923A (en) 2020-06-24 2020-06-24 Reset system, data processing system and related equipment

Country Status (2)

Country Link
CN (1) CN113835923A (en)
WO (1) WO2021259351A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023020031A1 (en) * 2021-08-17 2023-02-23 华为技术有限公司 Memory fault recovery method, system, and memory
WO2024016864A1 (en) * 2022-07-19 2024-01-25 华为技术有限公司 Processor, information acquisition method, single board and network device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820830A (en) * 2022-03-22 2023-09-29 华为技术有限公司 Data writing method and processing system
CN115168087B (en) * 2022-07-08 2024-03-19 超聚变数字技术有限公司 Method and device for determining repair resource granularity of memory failure
CN118585051A (en) * 2023-03-01 2024-09-03 华为技术有限公司 Electronic equipment and related reset recovery method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102736957B (en) * 2012-05-25 2015-07-08 华为技术有限公司 Resetting method and device
CN103116551B (en) * 2013-01-31 2016-05-04 苏州国芯科技有限公司 Be applied to the NorFLASH store interface module of CLB bus
CN103235760B (en) * 2013-01-31 2016-05-04 苏州国芯科技有限公司 High usage NorFLASH memory interface chip based on CLB bus
JP6742825B2 (en) * 2016-06-06 2020-08-19 キヤノン株式会社 Control device and control method
CN107678420B (en) * 2017-09-30 2020-01-31 北京理工大学 engine data online storage method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023020031A1 (en) * 2021-08-17 2023-02-23 华为技术有限公司 Memory fault recovery method, system, and memory
WO2024016864A1 (en) * 2022-07-19 2024-01-25 华为技术有限公司 Processor, information acquisition method, single board and network device

Also Published As

Publication number Publication date
WO2021259351A1 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
CN113835923A (en) Reset system, data processing system and related equipment
US10824499B2 (en) Memory system architectures using a separate system control path or channel for processing error information
US8495432B2 (en) Blocking write access to memory modules of a solid state drive
KR20180087256A (en) Predictive memory maintenance
US8902671B2 (en) Memory storage device, memory controller thereof, and method for programming data thereof
CN106463179A (en) Method, apparatus and system for handling data error events with memory controller
US10338844B2 (en) Storage control apparatus, control method, and non-transitory computer-readable storage medium
US20150095552A1 (en) Memory system for mirroring data
US20100241900A1 (en) System to determine fault tolerance in an integrated circuit and associated methods
EP4379553A1 (en) Memory fault recovery method, system, and memory
US11561871B2 (en) Data transmission and protection system and method thereof
CN114868117A (en) Peer-to-peer storage device messaging over a control bus
WO2013080299A1 (en) Data management device, data copy method, and program
WO2021088368A1 (en) Method and device for repairing memory
CN114579163A (en) Disk firmware upgrading method, computing device and system
US11340826B2 (en) Systems and methods for strong write consistency when replicating data
US20230236930A1 (en) Crc raid recovery from hard failure in memory systems
US11640335B2 (en) Multiple function level reset management
CN117642716A (en) Recovery from HMB loss
US20230386598A1 (en) Methods for real-time repairing of memory failures caused during operations, memory systems performing repairing methods, and data processing systems including repairing memory systems
CN112346922B (en) Server device and communication protocol method thereof
CN113868000B (en) Link fault repairing method, system and related components
CN113535459B (en) Data access method and device for responding to power event
WO2024169645A1 (en) Memory error correction method, system and device
CN116483630A (en) Memory fault repairing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination