WO2021259351A1 - Reset system, data processing system, and related device - Google Patents

Reset system, data processing system, and related device Download PDF

Info

Publication number
WO2021259351A1
WO2021259351A1 PCT/CN2021/102029 CN2021102029W WO2021259351A1 WO 2021259351 A1 WO2021259351 A1 WO 2021259351A1 CN 2021102029 W CN2021102029 W CN 2021102029W WO 2021259351 A1 WO2021259351 A1 WO 2021259351A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
storage unit
reset
module
replacement information
Prior art date
Application number
PCT/CN2021/102029
Other languages
French (fr)
Chinese (zh)
Inventor
刁阳彬
韩林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021259351A1 publication Critical patent/WO2021259351A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/52Protection of memory contents; Detection of errors in memory contents

Definitions

  • This application relates to the field of computer technology, in particular to a reset system, a data processing system and related equipment.
  • failure replacement processing refers to writing data in a failed storage unit in the memory into a backup storage unit in the memory, so as to realize the isolation of the failed storage unit.
  • This application provides a reset system, data processing system and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process, thereby After the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
  • this application provides a reset system that can be used in the field of managing memory data.
  • the reset system includes a reset control circuit, a processor core and a first module.
  • the first module includes a first register, the first register is used to store failure replacement information, one failure replacement information includes location information of a first storage unit corresponding to a failure replacement operation, and the first storage unit is in the memory
  • the first register may specifically be a status register or a configuration register.
  • the reset control circuit is used to obtain the hot reset signal, and in response to the obtained hot reset signal, send a reset instruction to the second module.
  • the second module includes the processor core and does not include the first module; that is, the reset control circuit sends a reset instruction to the processor
  • the processor core sends a reset command, but does not send a reset command to the first module.
  • the reset instruction is used to trigger the execution of the reset operation, so that after the reset operation is completed, the fault replacement information in the first register is not cleared.
  • Reset refers to restoring the state of the reset module/unit/device to the state when it was powered on for the first time.
  • the warm reset signal is used to trigger the warm reset operation.
  • the reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal.
  • a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, so that After the reset operation is completed, the failure replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to failure replacement processing of the failed storage unit in the memory, after the system is reset, it can be based on the aforementioned Fault replacement information to understand which storage units in the memory are isolated faulty storage units to avoid system downtime due to access to the isolated faulty storage units, that is, to achieve correct access to the memory to achieve failures in the use of the memory Under the premise of replacement technology and reset technology, the data in the memory will not be lost.
  • one or more failure replacement information is recorded in the first module, and one failure replacement information further includes location information of the second storage unit corresponding to a failure replacement operation.
  • the second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced, that is, the second storage unit is a storage unit that is replaced when the storage unit in the memory is faulty replaced.
  • the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect the current Which storage units in the memory have been replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.
  • a failure replacement information also includes a failure replacement type corresponding to a failure replacement operation, and the failure replacement type can be any of the following: memory module replacement, memory surface replacement, Memory particle replacement, memory block replacement, memory row replacement and memory storage cell replacement.
  • the granularity of the first storage unit with the fault is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  • the memory storage cell is the storage unit with the smallest granularity in the memory.
  • a memory row includes a row of memory storage cells, a memory row includes multiple memory storage cells, a memory block includes multiple memory rows, and a memory particle includes multiple memory cells.
  • a memory bank, a memory plane includes multiple memory particles, and a memory bank includes one or two memory planes.
  • the granularity of the storage unit in the memory can be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect any of the foregoing granularities.
  • Fault replacement operation that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.
  • the first module further includes at least one second storage unit, and the second storage unit in the first module is used to store a faulty storage unit when the at least one first storage unit is a memory storage unit.
  • the granularity of the second storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of second storage units in the first module may be 32, 64, or 128. .
  • the data in the faulty memory storage cell when there is a first storage unit whose granularity is a memory storage cell in at least one first storage unit, the data in the faulty memory storage cell will be It is written into the backup storage unit in the first module. Since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thus ensuring Data integrity.
  • the system includes a memory controller, and the first module is integrated in the memory controller.
  • the reset control circuit is specifically used to send a reset instruction to the processor core, and does not send a reset instruction to the memory controller.
  • the first module is integrated into the memory controller to facilitate memory control
  • the management of the first module by the memory controller also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, avoiding the problem of non-synchronization between different modules in the memory controller after reset .
  • the reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired.
  • the cold reset signal is used to trigger the cold reset operation.
  • the cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off.
  • the reset control circuit obtains the cold reset signal, it is proved that the reset operation is triggered because the memory is faulty.
  • the memory needs to be reset, that is, the data in the memory will be Clear, so that there is no need to ensure that the data in the memory is not lost.
  • the first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset system In sync.
  • the reset control circuit may include a logic circuit.
  • the reset control circuit obtains the hot reset signal, the output terminal of the reset control circuit is not coupled with the first module;
  • the reset control circuit obtains the cold reset signal, the output terminal of the reset control circuit is coupled with the first module.
  • the reset control circuit is further configured to send a first instruction to the first module, and the first instruction instructs the first module not to perform a reset operation.
  • this application provides a data processing system that can be used in the field of managing memory data.
  • the data processing system includes a processor core and a first module.
  • the first module includes a first register.
  • the first register is used to store fault replacement information.
  • the fault replacement information includes location information of the first storage unit.
  • the storage unit in the internal memory is faulty when it is replaced.
  • the processor core is used to obtain the fault replacement information from the first register and write the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not available when the processor core and the first module perform a reset operation. Lost.
  • a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system.
  • the processor The kernel writes the newly generated fault replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the fault replacement information, even if the faulty storage unit in the memory is replaced by a fault in the memory.
  • Part of the storage units are isolated and replaced.
  • the system is reset, it is possible to know which storage units in the memory are isolated faulty storage units based on the aforementioned fault replacement information, so as to avoid system downtime due to access to the isolated faulty storage unit. That is to say, the memory can be accessed correctly, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
  • the fault replacement information further includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced.
  • the granularity of the faulty storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  • the system includes a memory controller, and the first module is integrated in the memory controller.
  • the processor core is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and reset in the first register In the process, the fault replacement information set is backfilled to the first register.
  • the failure replacement information set includes at least one failure replacement information.
  • the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system.
  • the memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.
  • the first module further includes at least one second storage unit, and the second storage unit in the first module is used when the at least one first storage unit is a memory storage unit , Is stored as the first data in the first storage unit of the memory storage unit.
  • the processor core is also used to obtain the first data from the second storage unit in the first module, and write the first data into the non-volatile storage medium, so that the processor core and the first module are reset During operation, the first data is not lost.
  • the processor core is also used to obtain the fault replacement information collection and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, and collect the fault replacement information during the reset process of the first module Backfill to the first register, and backfill the first data to the second storage unit in the first module, wherein the failure replacement information set includes at least one failure replacement information.
  • the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.
  • the processor core is also used to perform a reset operation on the first module to initialize the first module, and when the reset operation is a warm reset operation, from the non-volatile storage
  • the failure replacement information collection is obtained from the medium, and the reverse replacement operation is performed on the data in the storage unit of the memory according to the failure replacement information collection.
  • the failure replacement information set includes at least one failure replacement information
  • the reverse replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to The initial state.
  • restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented.
  • a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory meets the
  • the storage unit under the failure replacement condition may become a usable storage unit again, so after resetting the processor core and the memory controller, perform reverse replacement operation on the data in the memory storage unit, that is, release the backup
  • the storage unit helps to extend the service life of the memory.
  • the processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
  • the processor core is further configured to not obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a cold reset operation.
  • this application provides a reset method that can be used in the field of managing memory data.
  • the method is applied to a reset system.
  • the system includes a reset control circuit, a processor core, and a first module.
  • the first module includes a first register.
  • the first register is used to store fault replacement information.
  • the fault replacement information includes location information of the first storage unit.
  • the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure.
  • the reset control circuit obtains the hot reset signal; the reset control circuit responds to the obtained hot reset signal and sends a reset instruction to the second module.
  • the second module includes the processor core and does not include the first module.
  • the reset instruction is used to trigger the reset operation .
  • the third aspect of this application is also used to execute the steps in the various implementations of the first aspect, the specific implementation steps of the third aspect of the application and the various possible implementations of the third aspect, and the implementation of each possible implementation.
  • the beneficial effects of the above reference may be made to the descriptions in the various possible implementation manners in the first aspect, which will not be repeated here.
  • this application provides a data processing method that can be used in the field of managing memory data.
  • the method is applied to a data processing system.
  • the data processing system includes a processor core and a first module.
  • the first module includes a first register.
  • the first register is used to store fault replacement information.
  • the fault replacement information includes location information of the first storage unit.
  • the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure.
  • the processor core obtains the fault replacement information from the first register; the processor core writes the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not lost when the processor core and the first module perform a reset operation .
  • the fourth aspect of the application is also used to execute the steps in the various implementations of the second aspect, the specific implementation steps of the fourth aspect and the various possible implementations of the fourth aspect of the application, and the implementation of each possible implementation.
  • the beneficial effects of refer to the descriptions in the various possible implementation manners in the second aspect, which will not be repeated here.
  • the present application provides a computer device configured with the reset system described in the first aspect above, or configured with the data processing system described in the second aspect above.
  • the present application provides a chip system including a processor for supporting the realization of the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above methods.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the server or the communication device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • FIG. 1 is a schematic structural diagram of a reset system provided by an embodiment of this application.
  • FIG. 2 is a schematic diagram of a work flow of the reset system provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of a fault replacement technique in the reset method provided by an embodiment of the application.
  • FIG. 4 is a system schematic diagram of a reset system provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of a workflow of a data processing system provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application.
  • FIG. 7 is a system schematic diagram of a reset system provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of another system of the reset system provided by an embodiment of the application.
  • FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of another system of the data processing system provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of a structure of a computer device provided by the implementation of this application.
  • the embodiment of the application provides a reset system, a data processing system, and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process Therefore, after the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
  • FIG. 1 can also be regarded as a schematic structural diagram of the data processing system provided by the embodiment of the present application.
  • the reset system includes a processor and a memory, and the processor and the memory can be configured in any form of electronic equipment.
  • the processor integrates a processor core (core), a reset control circuit, a memory controller (double data rate sdram controller, DDRC), and a high-speed physical interface transceiver (high-speed physical layer, HSPHY).
  • a software system can be mounted on the processor core to provide basic functions of the operating system.
  • the reset control circuit is used to trigger a module or unit in the processor to perform a reset operation, and is also used to trigger a memory to perform a reset operation.
  • the memory controller is used to convert the address in the access request issued by the processor core to the physical address in the memory, and transfer the access request to HSPHY. It is also used to efficiently schedule the access request issued by the processor core. It is also used to perform fault replacement operations on the storage unit in the memory.
  • HSPHY communicates with the memory outside the processor, and is used to obtain the digital signal generated by the memory controller, and convert the digital signal into an electrical signal, and then transmit it to the memory; it is also used to obtain the electrical signal generated by the memory and convert the electrical signal It is a digital signal and then transmitted to the memory controller.
  • the processor may also include more or fewer modules or units.
  • the memory controller may not be integrated into the processor, that is, the memory controller and the processor are Two independent devices, Figure 1 is only an example to facilitate understanding of the application environment commission of this solution, and is not used to limit this solution.
  • the embodiment of the present application provides a reset system.
  • the reset system adds a first module for storing fault replacement information.
  • the fault replacement information is used to reflect the distribution of data in the memory, so as to ensure that the reset system is reset. After that, the fault replacement information is not lost. After the system is reset, the data in the memory can still be accessed correctly according to the aforementioned fault replacement information, so as to realize the restoration of the data in the memory under the premise of using the fault replacement technology and reset technology in the memory. Not lost.
  • the first module is controlled not to perform the reset operation, so as to avoid the loss of fault replacement information; in another implementation, the fault in the first module
  • the replacement information is written into a non-volatile storage medium other than the reset system, so that the reset of the reset system will not cause the loss of fault replacement information, but the specific operation methods of the foregoing two cases are quite different, which will be introduced separately as follows .
  • FIG. 2 is a schematic diagram of a work flow of the reset system provided in the embodiment of the present application.
  • the work flow of the reset system provided in the embodiment of the present application may include:
  • the processor core sends a fault replacement instruction to the memory controller.
  • the processor core can obtain in real time whether the memory is faulty.
  • the memory controller performs error correction according to the instruction.
  • ECC Error Correction Code
  • CE correctable error
  • the memory may include one or more memory modules, a memory module may include one or two memory planes (rank), a memory plane may include multiple memory particles (devices), and a memory particle may include multiple memory blocks. (bank), a memory block can include multiple memory rows (row), and a memory row can include multiple memory storage cells (cell). It should be noted that the foregoing is a comparison of the size of the storage space in the memory. The memory is divided. In actual situations, the memory can also be divided from other angles.
  • the description information includes at least the location information where the CE error occurs.
  • the location information where the CE error occurred is used to indicate the location of the storage unit in the memory where the CE error occurred, that is, the location information in the description information can indicate which storage unit in the memory has the CE error.
  • the in-memory storage unit in the embodiment of the present application may specifically refer to one or more of the following: a memory storage cell, a memory row, a memory block, a memory particle, a memory surface, and a memory bar.
  • the description information may also include the CE error type.
  • the types of CE errors include, but are not limited to, CE errors generated when the processor core accesses the memory, CE errors generated during periodic memory inspections, or other types of CE errors, etc., which are not exhaustive here.
  • the granularity of the storage unit may be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect the faults of any of the foregoing granularities.
  • Replacement operation that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.
  • the processor core After the processor core obtains the description information corresponding to the CE error, it can determine the specific location in the memory where the CE error occurs, and then determine whether the storage unit in the memory meets the failure replacement condition. When the processor core determines that the fault replacement condition is met, it sends a fault replacement instruction to the memory controller, and when the fault replacement condition is not met, it can continue to monitor the memory.
  • fault replacement can be divided into multiple types according to different granularities: memory module replacement, memory surface replacement, memory particle replacement, memory block replacement, memory row replacement, and memory storage cell replacement.
  • the failure replacement conditions may include memory module failure replacement conditions, memory plane failure replacement conditions, memory particle failure replacement conditions, memory block failure replacement conditions, memory row failure replacement conditions, and memory storage cell failure replacement conditions.
  • the memory module failure replacement condition may specifically be that the number of CE errors occurring in the same memory module is greater than or equal to the first preset threshold, or the memory module failure replacement condition may specifically be that the same memory module has the same type of CE error occurring more than the number of times. Or equal to the second preset threshold, etc., the memory module failure replacement condition may also be other conditions.
  • the values of the first preset threshold and the second preset threshold can be flexibly set with reference to the actual situation, and are not limited here.
  • the meanings of the memory surface replacement conditions, the specific memory particle replacement conditions, the memory block failure replacement conditions, the memory row failure replacement conditions, and the memory storage cell failure replacement conditions are similar to the meanings of the aforementioned memory module failure replacement conditions. You can refer to the foregoing description for understanding , Do not repeat it here.
  • the fault replacement instruction carries at least the location information of the faulty storage unit and the location information of the backup storage unit.
  • the location information of the faulty storage unit can be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit; the location information of the backup storage unit can also be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit.
  • the fault replacement instruction may also carry a fault replacement type.
  • the aforementioned fault replacement type can also be expressed as a character code. As an example, for example, 00 represents the fault replacement type is memory block replacement, and 01 represents the fault replacement type is memory plane. Replacement, etc., will not be exhaustive here.
  • the memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.
  • the memory controller when the memory controller receives the fault replacement instruction, it can know which storage unit in the memory needs to be replaced and isolated according to the location information of the faulty storage unit and the location information of the backup storage unit, and the backup after replacement. The location of the storage unit. Furthermore, failure replacement processing is performed on the storage unit in the memory. In a failure replacement operation, the memory controller reads out the data in the failed storage unit and writes it into the backup storage unit. Among them, the memory controller may be integrated in the processor, or may be a separate device from the processor. Optionally, in a failure replacement operation, the memory controller also needs to reorganize the storage unit in the memory.
  • the faulty storage unit must be located in the memory, but the backup storage unit is not necessarily located in the memory.
  • a backup storage unit for storing data in the faulty memory storage cell may be integrated in the first module. That is, when the granularity of a certain faulty storage unit is a memory storage cell, the memory controller writes the data in the faulty memory storage cell into the backup storage unit in the first module.
  • the granularity of the faulty storage unit is a memory row, a memory block, a memory particle, a memory surface, or a memory bar
  • the corresponding backup storage unit can be set in the memory.
  • FIG. 3 is a schematic diagram of the fault replacement technology in the reset method provided by the embodiment of the application.
  • the storage unit that needs to be replaced with failure is a memory particle as an example.
  • Figure 3 includes three sub-schematics (a), (b) and (c).
  • the sub-schematic diagram (a) represents the data distribution in the memory bank before the failure replacement operation.
  • a memory The bar includes two memory planes (Rank A and Rank B respectively).
  • Each memory plane includes 18 memory particles.
  • the 18 memory particles include 16 particles for normal data storage, as well as an ECC particle and parity. Check bit particles.
  • the ECC error correction particles can also be regarded as backup particles.
  • Rank A and Rank B can be reorganized so that Rank A and Rank B share an ECC error correction particle of Rank B, which is represented by the sub-schematic diagram (a)
  • the two 16+2 storage modes in (c) have become the 32+3 mode in the sub-schematic diagram in (c). It should be understood that the example in FIG. 3 is only to facilitate understanding of the failure replacement technology, and is not used to limit the solution.
  • the memory controller After performing fault replacement processing on the storage unit in the memory, the memory controller writes the fault replacement information into the first register.
  • the failure replacement information includes the location information of the first storage unit and the location information of the second storage unit.
  • the first storage unit is the storage unit that has a CE error when the storage unit in the memory is replaced with a failure (that is, when the failure is replaced The storage unit replaced later), and the second storage unit is a backup storage unit when the storage unit in the memory is replaced by a failure (that is, the storage unit used after the failure is replaced). Therefore, the failure replacement information can reflect the distribution of the data in the memory in the storage unit after the failure replacement processing is performed on the storage unit that has failed in the memory.
  • a piece of fault replacement information may also include the granularity level of the faulty storage unit, the CE error type of the replaced faulty storage unit, or other types of information, etc.
  • the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect Which storage units in the current memory are replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.
  • region 0 in Table 1 refers to that the failure replacement operation corresponding to the failure replacement information occurred in the region numbered 0 in the memory (region).
  • region0-enable is a field in the fault replacement information used to indicate whether a fault replacement has been performed in region0, and the code 0 in region0-enable indicates that a fault replacement has been performed in region0.
  • the region0-size is a field in the failure replacement information used to indicate the granularity of the storage unit to be replaced in region0.
  • the code 00 in the region0-size indicates that the granularity of the storage unit to be replaced in the region0 is a bank.
  • region0-rank indicates the number of the memory plane where the memory block that needs to be replaced in region0 is located
  • region0-device indicates the number of the memory particle where the memory block that needs to be replaced in region0 is located
  • region0-bank indicates the number of the memory block that needs to be replaced in region0.
  • the number of the memory block, region0-rank, region0-device, and region0-bank collectively indicate the location of the first storage unit, as shown in Table 1, the first storage unit is the number 5 in the memory plane numbered 10 in region0 The memory block numbered 14 in the memory granule.
  • region0-buddy-rank indicates the number of the memory plane where the backup memory block is located
  • region0-buddy-device indicates the number of the memory particle where the backup memory block is located
  • region0-buddy-bank indicates the number of the memory block in region0 that needs to be replaced by failure.
  • region0-buddy-rank, region0-buddy-device, and region0-buddy-bank collectively indicate the location of the second storage unit, as shown in Table 1, the second storage unit is the number 13 in the memory plane numbered 18 in region0 The memory block numbered 22 in the memory granule.
  • the fault replacement information may include more or less information, and the examples in Table 1 are only to facilitate understanding of the concept of fault replacement information, and are not used to limit the solution.
  • the first register belongs to the first module, and fault replacement information is stored in the first register.
  • the first register may specifically be represented as a status register, a configuration register, or other types of registers, etc., which is not limited here.
  • the first module can be integrated in the memory controller. Further, one first register stores one fault replacement information, and the first module can be configured with multiple first registers to record multiple fault replacement information.
  • the first module may also be configured with at least one second storage unit, and the second storage unit in the first module is used for the case where the at least one first storage unit is a faulty storage unit of the memory storage unit, Store the data in the first storage unit as the memory storage unit.
  • the reset instruction since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thereby ensuring the integrity of the data.
  • the granularity of the backup storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of backup storage units in the first module may be 32, 64, 128, or others. Quantity etc.
  • a second register may also be configured in the reset system.
  • the second register is used to record the state of the memory controller in the fault replacement operation.
  • the aforementioned status may include no fault replacement operation, fault replacement operation in progress, and fault replacement. Operation success, failure replacement operation failure, or other types of status, etc., are not limited here.
  • the second register can also be integrated in the memory controller.
  • one or more sets of registers may be configured in the reset system, and each set of registers includes a first register and a second register.
  • the reset control circuit obtains the reset signal.
  • a reset operation may be required during the operation of the reset system, so that the reset control circuit can obtain the reset signal.
  • reset refers to restoring the state of the reset module/unit/device to the state of power-on for the first time.
  • the reset control circuit can be integrated in the processor.
  • the reset control circuit can determine whether the received reset signal is a warm reset signal or a cold reset signal.
  • the cold reset signal is generally caused by a memory failure and is used to trigger a cold reset operation.
  • the cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off.
  • the warm reset signal is generally caused by a non-memory fault and is used to trigger a warm reset operation.
  • the warm reset operation refers to not resetting some modules/units/devices during the resetting process of the system.
  • the reset control circuit may include a first pin and a second pin. If the reset control circuit is a reset signal obtained from the first pin, the reset control circuit obtains It is a cold reset signal; if the reset control circuit is a reset signal obtained from the second pin, the reset control circuit obtains a hot reset signal.
  • the reset signal may be represented as a group of low-level signals, and the aforementioned group of low-level signals may include one or more low-level signals.
  • the reset control circuit obtains the cold reset signal and the warm reset signal from the same signal source, and the cold reset signal and the warm reset signal are specifically represented as different electrical signals.
  • the cold reset signal is represented as 01 signal, or 0101 signal, or 0011 signal
  • the hot reset signal is specifically represented as 10 signal, or 1010, or 1100, etc.
  • "0" refers to a low level signal
  • "1” refers to a high level signal. Therefore, the reset control circuit can determine whether it is a cold reset signal or a warm reset signal according to the form of the received electrical signal. It should be understood that the examples of the cold reset signal and the warm reset signal here are only to facilitate the understanding of the solution, and are not used to limit the present solution. plan.
  • the reset control circuit sends a reset instruction to the second module.
  • the second module includes the processor core and does not include the first module.
  • the reset control circuit controls the processor core to perform the reset operation in response to the obtained reset signal, and controls the first module not to perform the reset operation, and also controls the memory not to perform the reset operate. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the first module.
  • the second module may also include other modules in the reset system except the first module, as long as it is ensured that the first module and the memory do not perform a reset operation. It should be noted that the concept of the second module can be an artificially divided conceptual module.
  • the reset control circuit in response to the acquired reset signal, sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory.
  • the reset instruction is used to trigger the execution of the reset operation, so as to control the processor core to execute the reset. And control the first module not to perform a reset operation, so that the data stored in the first module is not reset, that is, the data stored in the first module is not cleared.
  • the reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal. Etc., it is not limited here.
  • the reset control circuit when the acquired reset signal is a hot reset signal, the reset control circuit sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory.
  • a reset instruction is sent to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.
  • the reset control circuit may also send a first instruction to the first module and the memory respectively, and the first instruction instructs not to perform the reset operation. Therefore, the processor core performs the reset operation after receiving the reset instruction, and the first module and memory do not perform the reset operation after receiving the first instruction, so as to control the processor core to perform the reset operation and control the first module and the memory not to execute Reset operation.
  • the reset control circuit sends the first instruction to the first module.
  • the reset instruction and the first instruction can be expressed as two different electrical signals, so the reset control circuit can send different electrical signals to the processor core and the first module to send to the processor core
  • the reset instruction sends the first instruction to the first module.
  • the first module can determine whether the received first instruction is based on the type of the received electrical signal. As an example, for example, the reset command is 111000, the first command is 000111, "0" refers to a low-level signal, and "1" refers to a high-level signal.
  • a third pin and a fourth pin may be provided in the first module.
  • the reset control circuit wants to send a reset command to the first module, the reset control circuit sends a command to the third pin; Correspondingly, if the first module is an instruction acquired through the third pin, it is deemed that the acquired instruction is a reset instruction. If the reset control circuit wants to send the first command to the first module, the reset control circuit sends the command to the fourth pin; correspondingly, if the first module is the command obtained through the fourth pin, it is deemed to be obtained Is the first instruction.
  • the implementation manner of the reset control circuit sending the first instruction to the memory is similar to the implementation manner of the reset control circuit sending the first instruction to the first module, and will not be repeated here.
  • the reset control circuit sends a reset instruction to the processor core and a first instruction to the first module and the memory when the acquired reset signal is a hot reset signal.
  • the reset control circuit sends a reset instruction to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.
  • step 204 may include: after obtaining the reset signal, the reset control circuit sends a reset instruction to the processor core in response to the obtained reset signal, and does not send a reset instruction to the processor core.
  • the memory controller sends a reset command. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the memory controller.
  • the specific implementation of the reset control circuit controlling the processor core to perform the reset operation is the same as the above description.
  • the specific implementation of the reset control circuit controlling the memory controller not to perform the reset operation is similar to the above description, except that the execution object in the above description is The first module, the execution object in this implementation is the entire memory controller, which will not be repeated here.
  • the failure replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory
  • the first module is integrated into the memory controller to facilitate the memory
  • the controller's management of the first module also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, to avoid the occurrence of asynchronization between different modules in the memory controller after reset problem.
  • the reset control circuit sends a reset instruction to the processor core and the first module.
  • the reset control circuit when the reset control circuit determines that the acquired reset signal is a cold reset signal, the first instruction is sent to the processor core, the first module, and the memory to control the processor core and the first module. And the memory performs a reset operation.
  • the reset control circuit may also send a reset instruction to other modules in the reset system.
  • the reset control circuit may include a logic circuit. When the reset control circuit obtains a warm reset signal, the output terminal of the reset control circuit is not coupled with the first module; when the reset control circuit obtains a cold reset When the signal is applied, the output terminal of the reset control circuit is coupled with the first module.
  • the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty.
  • the memory needs to be reset, that is, the data in the memory will be It is cleared so that there is no need to ensure that the data in the memory is not lost.
  • the first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset The system is in sync.
  • the first module is integrated in the memory controller, and when the reset control circuit determines that the acquired reset signal is a cold reset signal, the reset control circuit controls the processor core and the memory controller to perform the reset operation, and the reset control circuit Will control the memory to perform a reset operation.
  • the specific implementation is similar to the above, with the difference that the first module in the above description is replaced with a memory controller, which is not repeated here.
  • FIG. 4 is a system schematic diagram of a reset system provided by an embodiment of this application.
  • Figure 4 takes the first module integrated in the memory controller and the memory controller integrated in the processor as an example.
  • the reset control circuit obtains a cold reset signal
  • the reset control circuit sends a reset instruction to the processor core, the memory controller, HSPHY and the memory to trigger the entire reset system and the memory to perform a reset operation.
  • the reset control circuit obtains a hot reset signal
  • the reset control circuit sends a reset instruction to the processor core, and does not send a reset instruction to the memory controller, HSPHY and memory, so as to control the first module not to perform the reset operation.
  • FIG. 4 is only for a more intuitive understanding of the solution, and is not used to limit the solution.
  • a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, thereby After the reset operation is completed, the fault replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to fault replacement processing of the fault storage unit in the memory, after the system is reset, it can be based on
  • the aforementioned fault replacement information understands which storage units in the memory are isolated faulty storage units, so as to avoid system downtime caused by accessing the isolated faulty storage units, that is, to achieve correct access to the memory, so as to realize the use of the memory. Under the premise of fault replacement technology and reset technology, the data in the memory will not be lost.
  • FIG. 5 is a schematic diagram of a work flow of the data processing system provided in the embodiment of this application.
  • the work flow of the data processing system provided in the embodiment of this application may include:
  • the processor core sends a fault replacement instruction to the memory controller.
  • the memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.
  • steps 501 and 502 are similar to the specific implementation manners of steps 201 and 202 in the embodiment corresponding to FIG.
  • the processor core writes the fault replacement information into the non-volatile storage medium.
  • the processor core can read the fault replacement information from the first register in the first module, and Write the newly generated failure replacement information into the non-volatile storage medium.
  • the non-volatile storage medium may specifically be a hard disk, a complex programmable logic device (CPLD), an electrically erasable programmable read only memory (EEPROM), or other types of non-volatile storage media.
  • the non-volatile storage medium and the processor core may be configured in the same device, or may be configured in a different device from the processor core.
  • the processor core and the non-volatile storage medium can communicate data through an internal interface or an external interface.
  • the internal interface includes but is not limited to a bus, and the external interface includes a wired communication interface and a wireless communication interface.
  • the processor core reads the fault replacement information from the first register. After the memory controller writes the fault replacement information into the first register, it will show the signal of completing the fault replacement technology to the processor core, and the processor core reads the fault replacement information from the first module after learning the completion signal.
  • a second register is configured in the data processing system. After the memory controller writes the fault replacement information into the first module, the fault is written in the second register.
  • the information that the replacement operation is successful that is, the signal that shows the completion of the failure replacement technology
  • the processor core after reading the information in the second register, determines that the memory controller has completed the failure replacement operation, and copies it from the first register Fault replacement information.
  • the processor core writes the first data stored in the second storage unit in the first module into the non-volatile storage medium.
  • the first module may further include at least one second storage unit, and the second storage unit in the first module is used to store as The memory stores the first data in the first storage unit of the cell.
  • the memory controller can write the first data in the faulty storage unit into the second storage unit (that is, the backup storage unit) in the first module, and then After the memory controller writes the fault replacement information into the first register in the first module, the processor core can read the first data from the second storage unit included in the first module, and write the first data into the nonvolatile In a sexual storage medium, the first data is not lost when the processor core and the first module perform a reset operation.
  • the processor core reads the first data from the backup storage unit in the first module. After the memory controller writes the fault replacement information into the first register, it will show to the processor core a signal to complete the fault replacement technology. After the processor core learns the completion signal, it reads from the backup storage unit in the first module. Take the first data.
  • the specific implementation manner in which the processor core determines that the memory controller has completed the fault replacement operation has been introduced in step 503, and will not be repeated here.
  • step 504 is an optional step. If there is no faulty storage unit whose granularity is a memory storage cell, step 504 does not need to be performed. If step 504 is performed, the embodiment of the present application does not limit the execution order between step 503 and step 504. Step 503 can be performed first, and then step 504; or step 504 can be performed first, and then step 503 can be performed at the same time. Steps 503 and 504.
  • the reset control circuit obtains a reset signal.
  • step 505 is similar to the specific implementation of step 203 in the embodiment corresponding to FIG.
  • the reset control circuit sends a reset instruction to the processor core and the first module.
  • the reset control circuit after the reset control circuit obtains the reset signal, regardless of whether the obtained reset signal is a warm reset signal or a cold reset signal, the reset control circuit sends a reset instruction to the processor core and the first module to trigger processing The processor core and the first module perform a reset operation. Further, if the acquired signal is a warm reset, the reset control circuit does not send a reset instruction to the memory to control the memory not to perform the reset operation; if the acquired signal is a cold reset, the reset control circuit sends a reset instruction to the memory to control The memory performs a reset operation. Among them, the manifestation of the reset command has been introduced in the embodiment corresponding to FIG. 2, and will not be repeated here. It should be noted that although the first module in FIG. 5 is integrated in the memory controller, in actual situations, the first module may also be provided outside the memory controller, which is not limited here.
  • the entire data processing system may behave as a processor, and the reset control circuit may send the reset signal to the entire
  • the processor sends a reset instruction to control the entire processor to perform a reset operation.
  • the processor core determines whether the reset operation is a warm reset operation, if it is a warm reset operation, go to step 508, and if it is a cold reset operation, go to step 510.
  • a third register is further provided in the reset control circuit, and the third register is used to record whether the reset signal acquired by the reset control circuit this time is a cold reset signal or a warm reset signal.
  • the processor core queries the information recorded in the third register to determine whether the reset signal that triggered the reset operation is a hot reset signal, that is, whether the reset operation is hot Reset operation.
  • the processor core performs a reset operation on the processor core and the first module.
  • initialization software is running in the processor core, and if it is determined to be a warm reset operation, the initialization software in the processor core needs to perform a reset operation on the processor core and the first module.
  • the initialization software in the processor core obtains the failure replacement information collection from the non-volatile storage medium. Since more than one failure replacement operation can occur during the operation of the data processing system, and one failure replacement information is used to record the replacement information of the storage unit in a failure replacement operation, the processor core obtains the information from the non-volatile storage medium. It can be a failure replacement information collection that includes one or more failure replacement information.
  • the initialization software may specifically be expressed as a basic input output system (BIOS) system.
  • the initialization software in the processor core also obtains the first data from the non-volatile storage medium during the reset and startup process.
  • the initialization software in the processor core backfills the failure replacement information set to the first register in the process of resetting the first register.
  • the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system Later, the memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.
  • the initialization software in the processor core triggers the reset operation on the processor core and the first module
  • the initialization software in the processor core performs the reset operation on the first register.
  • Each failure replacement information is backfilled into a plurality of first registers respectively. Since the configuration register only supports hardware writing, and the status register supports both hardware writing and software writing, the first register is specifically represented as a status register in this implementation.
  • the initialization software in the processor core backfills the failure replacement information set to the first register and backfills the first data to the first module during the reset operation of the first module
  • the second storage unit The implementation manner of the processor core backfilling the first data to the second storage unit in the first module is similar to the implementation manner of backfilling the fault replacement information to the first register, and will not be repeated here.
  • the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.
  • the initialization software in the processor core performs a reset operation on the first module to initialize the first module; and according to the failure replacement information set, performs reverse replacement operation on the data in the storage unit of the memory, reverse replacement The operation is used to write the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state.
  • restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented.
  • a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory
  • the storage unit that meets the failure replacement conditions may become usable storage unit again, so after resetting the processor core and memory controller, perform reverse replacement operation on the data in the memory storage unit, which is also released
  • the backup storage unit helps to extend the service life of the memory.
  • the initialization software in the processor core performs a reset operation on the first module, so that after the first module is initialized, the failure replacement information set recorded in the first module is cleared. Since each failure replacement information records the replacement relationship between a first storage unit and a second storage unit, the initialization software in the processor core can learn the location of the first storage unit and the second storage unit based on the failure replacement information. The location of the unit, and then rewrite the data stored in a second storage unit to the first storage unit, that is, perform an inverse replacement operation on the data in the storage unit of the memory.
  • the initialization software in the processor core also needs to use the data in the parity check particles to verify the data in the second storage unit. If it is found in the second storage unit If there is an error in the data, use the data in the ECC error correction particles to correct the data in the second storage unit, and then rewrite the data in the second storage unit after the error correction process into the first storage unit .
  • the reverse replacement operation also needs to reorganize the data in the memory particles.
  • FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application. Take Figure 3 as an example.
  • Figure 6 includes (a) and (b) two sub-schematic diagrams.
  • (a) sub-schematic diagram represents the data distribution in the memory bank before the reverse replacement operation, as shown in (a) sub-schematic diagram.
  • the fault replacement operation is performed, the data of particle 1 in Rank A is written into the ECC error correction particles of Rank A, and Rank A and Rank B share an ECC error correction particle of Rank B, and the reverse replacement operation is required Rewrite the data in the ECC error correction particle of Rank A into the particle 1 of Rank A.
  • the sub-schematic diagram represents the data distribution in a memory stick after the reverse replacement operation.
  • the processor core uses the parity bit particles of Rank A to verify the data in the ECC error correction particles of Rank A, it is found There is no error in the data in the ECC error correction particles of Rank A, and the data in the ECC error correction particles of Rank A are read and written into the particle 1 of Rank A.
  • the processor core also reorganizes the data of Rank A and Rank B. , That is, the data storage mode in Rank A and Rank B is changed back to the two 16+2 storage modes, so that the distribution of the data in the memory in the storage unit is restored to the initial state. It should be understood that in Figure 6 The example is only to facilitate the understanding of the fault replacement technology, and is not used to limit the solution.
  • steps 501 to 504 and steps 505 to 508 are executed once after steps 501 to 504 are executed multiple times.
  • the processor core does not obtain the failure replacement information collection from the non-volatile storage medium.
  • the processor core determines that this reset operation is a cold reset operation, the processor core no longer obtains the failure replacement information collection from the non-volatile storage medium, but directly reports to the processor
  • the kernel, the first module, the memory controller, and the memory perform a reset operation, that is, initialize the entire data processing system.
  • the reset control circuit since the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be It is cleared, so there is no need to ensure that the data in the memory is not lost. In this case, the failure replacement information collection is no longer obtained from the non-volatile storage medium, avoiding redundant steps, and improving the efficiency of the reset process.
  • steps 507 and 509 are optional steps. If steps 507 and 509 are not executed, step 508 can be directly executed after step 505 is executed.
  • a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system.
  • the processing The processor core writes the newly generated failure replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the failure replacement information, even if the failure replacement processing of the failed storage unit in the memory causes the memory Part of the storage units in the system are isolated and replaced.
  • FIG. 7 is a schematic diagram of a reset system provided by an embodiment of the application.
  • the reset system 700 may include a reset control circuit 701, a processor core 7021, and a first module 703.
  • the first module 703 includes a first register, the first register is used to store failure replacement information, the failure replacement information includes location information of the first storage unit, and the first storage unit is present when the storage unit in the memory is replaced with failure.
  • the failure replacement information also includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure.
  • the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  • the first module 703 further includes at least one second storage unit, and the second storage unit in the first module 703 is used to store the The data in the first storage unit of the memory storage cell.
  • FIG. 8 is a system schematic diagram of the reset system provided by an embodiment of the application.
  • the reset system 700 includes a memory controller 704, and the first module 703 is integrated in the memory controller 704.
  • the reset control circuit 701 is specifically configured to control the processor core 7021 to perform a reset operation, and control the memory controller 704 not to perform a reset operation.
  • the reset control circuit 701 is also used to send a reset instruction to the processor core 7021 and the first module 703 when a cold reset signal is acquired.
  • the reset control circuit 701 is specifically configured to send a reset instruction to the processor core 7021 and a first instruction to the first module 703.
  • the first instruction instructs the first module 703 not to perform a reset operation.
  • FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of the present application.
  • the data processing system 900 includes a processor core 901 and a first module 902.
  • the first module 902 includes a first register.
  • the first register is used to store fault replacement information.
  • the fault replacement information includes location information of the first storage unit.
  • the unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure.
  • the processor core 901 is used to obtain fault replacement information from the first register; the processor core 901 is also used to write fault replacement information into a non-volatile storage medium, so that the processor core 901 and the first module When 902 performs a reset operation, the fault replacement information is not lost.
  • the failure replacement information also includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure.
  • the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  • FIG. 10 is a system schematic diagram of a data processing system provided by an embodiment of this application.
  • the system 900 includes a memory controller 903, and the first module 902 is integrated in the memory controller 903.
  • the processor core 901 is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and during the resetting process of the first register, Backfill the failure replacement information set to the first register, where the failure replacement information set includes at least one failure replacement information.
  • the first module 902 further includes at least one second storage unit, and the second storage unit in the first module 902 is used to store the The first data in the first storage unit of the memory storage cell.
  • the processor core 901 is also used to obtain the first data from the second storage unit in the first module 902, and write the first data into the non-volatile storage medium, so that the processor core 901 and the first When the module 902 performs a reset operation, the first data is not lost; the processor core 901 is also used to obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, During the resetting process of the first module 902, the failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module 902, wherein the failure replacement information set includes at least one failure replacement information.
  • the processor core 901 is also used to perform a reset operation on the first module 902 to initialize the first module 902; the processor core 901 is also used to perform a reset operation when the reset operation is a warm reset operation , Obtain the failure replacement information collection from the non-volatile storage medium, and perform the reverse replacement operation on the data in the storage unit of the memory according to the failure replacement information collection, wherein the failure replacement information collection includes at least one failure replacement information, and The replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state.
  • the processor core 901 is also used to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
  • FIG. 11 is a schematic diagram of a structure of the computer device provided in the implementation of this application.
  • the reset system 700 described in the embodiment corresponding to FIG. 7 or FIG. 8 may be deployed on the computer device 110 to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4.
  • the data processing system 900 described in the embodiment corresponding to FIG. 9 or FIG. 10 may be deployed on the computer device 110 to implement the functions of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6.
  • the computer device 110 includes: a wired or wireless network interface 1101, an input/output interface 1102, a processor 1103, and a non-volatile storage medium 1104 (the number of processors 1103 in the computer device 110 may be one or more, as shown in FIG. Take a processor as an example in 11).
  • the processor 1103 may include an application processor 11031 and a communication processor 11032.
  • the memory 1104 may include a non-volatile storage medium 11041 and a memory 11042.
  • the wired or wireless network interface 1101, the input/output interface 1102, the processor 1103, and the non-volatile storage medium 1104 may be connected by a bus or other means.
  • the memory 11042 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1103.
  • a part of the non-volatile storage medium 11041 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the non-volatile storage medium 1104 stores processors and operating instructions, executable modules or data structures, or their subsets, or their extended sets.
  • the operating instructions may include various operating instructions for implementing various operate.
  • the processor 1103 controls the operation of the computer device.
  • the various components of the computer equipment are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1103 or implemented by the processor 1103.
  • the processor 1103 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1103 or instructions in the form of software.
  • the aforementioned processor 1103 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1103 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and completes the steps of the foregoing method in combination with its hardware.
  • the wired or wireless network interface 1101 is used to implement the signal sending and signal receiving functions of the computer device 110.
  • the input and output interface 1102 can be used to receive input digital or character information, and output digital or character information; the input and output interface 1102 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the input and output interface 1102 also It can include display devices such as display screens.
  • the application processor 11031 is configured to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4. It should be noted that, for the specific implementation of the function of the reset system in the embodiment corresponding to FIGS. 2 to 4 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIGS. 2 to 4 The narratives in, I will not repeat them one by one here.
  • the application processor 11031 is configured to implement the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6. It should be noted that, for the specific implementation of the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIG. 5 or FIG. 6 The narratives in, I will not repeat them one by one here.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a program for generating the driving speed of a vehicle.
  • the computer executes the steps shown in Figs. 2 to 4 above. The steps performed by the reset system in the method described in the illustrated embodiment, or the steps performed by the data processing system in the method described in the foregoing embodiment shown in FIG. 5 or FIG. 6 are performed.
  • the embodiment of the present application also provides a product including a computer program, which when it runs on a computer, causes the computer to execute the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or execute The steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6 are the same.
  • An embodiment of the present application also provides a circuit system, the circuit system includes a processing circuit configured to perform the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or , Execute the steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6.
  • the reset system or data processing system provided by the embodiment of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface or a pin. Or circuits, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit to make the chip execute the reset method described in the embodiment shown in FIG. 2 to FIG. 4, or the data processing method described in the embodiment shown in FIG. 5 or FIG.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a storage unit located outside the chip.
  • Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the foregoing first aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CLUs, dedicated memories, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware, and the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the method described in each embodiment of this application .
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiments of the present application disclose a reset system, a data processing system, and a related device, and the method can be applied in the field of managing memory data. The reset system comprises a reset control circuit, a processor core, and a first register; failure replacement information recorded by the first register comprises location information of a first storage unit, and the first storage unit is a storage unit that experiences a failure when performing failure replacement on storage units in a memory. The reset control circuit responds to an obtained reset signal and sends a reset instruction to a second module. The second module comprises a processor core and does not comprise a first module. The new concept of failure replacement information is proposed, and a first register dedicated to recording failure replacement information is added. After a reset operation is complete, the data in a memory can be correctly accessed according to the failure replacement information, thus the data in the memory will not be lost when using failure replacement technology and reset technology in the memory.

Description

一种复位系统、数据处理系统以及相关设备A reset system, data processing system and related equipment
本申请要求于2020年6月24日提交中国专利局、申请号为202010588804.3、发明名称为“一种复位系统、数据处理系统以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010588804.3, and the invention title is "a reset system, data processing system and related equipment", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种复位系统、数据处理系统以及相关设备。This application relates to the field of computer technology, in particular to a reset system, a data processing system and related equipment.
背景技术Background technique
随着内存容量的提升以及内存速率提高,使得内存故障率持续升高。当内存中的存储单元发生故障时,若没有及时对故障的存储单元进行处理,很容易导致系统宕机等不可修正错误(uncorrectable error,UCE),进而会造成硬件返还。目前,在发生内存UCE之前,可以通过对内存中故障的存储单元进行故障替换处理,来降低内存中出现UCE的概率。其中,故障替换处理指的是将内存中故障的存储单元中的数据写入内存中的备份存储单元中,以实现对故障的存储单元的隔离。As the memory capacity increases and the memory speed increases, the memory failure rate continues to increase. When the storage unit in the memory fails, if the failed storage unit is not processed in time, it is easy to cause uncorrectable errors (UCE) such as system downtime, which in turn will result in the return of hardware. At present, before memory UCE occurs, it is possible to reduce the probability of UCE in the memory by performing fault replacement processing on the storage unit that is faulty in the memory. Among them, failure replacement processing refers to writing data in a failed storage unit in the memory into a backup storage unit in the memory, so as to realize the isolation of the failed storage unit.
但由于在对内存中的存储单元进行了故障替换之后,会导致内存中的数据分布发生改变,从而如果后续遇到系统复位的情况,复位后就会无法正确访问内存中的数据,从而导致内存数据丢失。However, after the faulty replacement of the storage unit in the memory, the data distribution in the memory will change, so if the system is reset in the future, the data in the memory will not be correctly accessed after the reset, resulting in the memory data lost.
因此,如何能够在使用内存中的故障替换技术和复位技术的前提下,实现内存中数据的不丢失,成为亟待解决的问题。Therefore, how to achieve the non-loss of data in the memory under the premise of using the fault replacement technology and the reset technology in the memory has become an urgent problem to be solved.
发明内容Summary of the invention
本申请提供了一种复位系统、数据处理系统以及相关设备,提出故障替换信息这一新概念,并增设有专门记录故障替换信息的第一寄存器,在复位过程中保证故障替换信息不丢失,从而在完成复位操作之后,能够根据故障替换信息,正确访问内存中的数据,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。This application provides a reset system, data processing system and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process, thereby After the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
为解决上述技术问题,本申请提供以下技术方案:In order to solve the above technical problems, this application provides the following technical solutions:
第一方面,本申请提供一种复位系统,可用于管理内存数据的领域中。该复位系统包括复位控制电路、处理器内核和第一模块。其中,第一模块中包括第一寄存器,第一寄存器用于存储故障替换信息,一个故障替换信息包括与一次故障替换操作对应的第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故障的存储单元,也即第一存储单元为在对内存中的存储单元进行故障替换时被替换掉的存储单元;第一寄存器具体可以为状态寄存器或配置寄存器。复位控制电路,用于获取热复位信号,并响应获取到的热复位信号,向第二模块发送复位指令,第二模块包括处理器内核,且不包括第一模块;也即复位控制电路向处理器内核发送复位指令,不向第一模块发送复位指令。复位指令用于触发执行复位操作,以在完成复位操作后,第一寄存器中的故障替换信息不被清除。复位指的是将被复位的模块/单元/装置的状态恢复为首次上电的状态。热复位信号 用于触发热复位操作。复位指令可以为一组低电平信号,该一组低电平信号中包括至少一个低电平信号;复位指令也可以为同时包括低电平信号和高电平信号的一组电信号。本实现方式中,提出故障替换信息这一新概念,并在复位系统中增设专门用来存储故障替换信息的第一寄存器;在执行复位操作的过程中,控制第一模块不进行复位,从而在完成复位操作之后,第一寄存器中的故障替换信息能够不被复位,即使由于对内存中的故障存储单元进行故障替换处理导致内存中的部分存储单元被隔离替换,在系统复位后,能够根据前述故障替换信息了解内存中哪些存储单元为被隔离的故障存储单元,以避免由于访问被隔离的故障存储单元而导致的系统宕机,也即能够实现正确访问内存,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。In the first aspect, this application provides a reset system that can be used in the field of managing memory data. The reset system includes a reset control circuit, a processor core and a first module. Wherein, the first module includes a first register, the first register is used to store failure replacement information, one failure replacement information includes location information of a first storage unit corresponding to a failure replacement operation, and the first storage unit is in the memory When the storage unit is replaced with a fault, that is, the first storage unit is a storage unit that is replaced when the storage unit in the memory is replaced with a fault; the first register may specifically be a status register or a configuration register. The reset control circuit is used to obtain the hot reset signal, and in response to the obtained hot reset signal, send a reset instruction to the second module. The second module includes the processor core and does not include the first module; that is, the reset control circuit sends a reset instruction to the processor The processor core sends a reset command, but does not send a reset command to the first module. The reset instruction is used to trigger the execution of the reset operation, so that after the reset operation is completed, the fault replacement information in the first register is not cleared. Reset refers to restoring the state of the reset module/unit/device to the state when it was powered on for the first time. The warm reset signal is used to trigger the warm reset operation. The reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal. In this implementation, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, so that After the reset operation is completed, the failure replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to failure replacement processing of the failed storage unit in the memory, after the system is reset, it can be based on the aforementioned Fault replacement information to understand which storage units in the memory are isolated faulty storage units to avoid system downtime due to access to the isolated faulty storage units, that is, to achieve correct access to the memory to achieve failures in the use of the memory Under the premise of replacement technology and reset technology, the data in the memory will not be lost.
在第一方面的一种可能实现方式中,第一模块中记录有一个或多个故障替换信息,一个故障替换信息中还包括与一次故障替换操作对应的第二存储单元的位置信息。其中,第二存储单元为在对内存中的存储单元进行故障替换时的备份存储单元,也即第二存储单元为在对内存中的存储单元进行故障替换时替换后的存储单元。In a possible implementation of the first aspect, one or more failure replacement information is recorded in the first module, and one failure replacement information further includes location information of the second storage unit corresponding to a failure replacement operation. The second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced, that is, the second storage unit is a storage unit that is replaced when the storage unit in the memory is faulty replaced.
本实现方式中,故障替换信息中至少包括被替换掉的存储单元的位置信息和替换后的存储单元的位置信息,也即通过故障替换信息记录了内存中发生的故障替换操作,不仅能够反映当前内存中哪些存储单元由于故障被替换隔离,也反映了进行故障替换后的数据存储于哪个存储单元中,从而很直观的反映出内存中的数据在存储单元中的分布情况。In this implementation, the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect the current Which storage units in the memory have been replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.
在第一方面的一种可能实现方式中,一个故障替换信息中还包括与一次故障替换操作对应的故障替换类型,故障替换类型可以为以下中的任一种:内存条替换、内存面替换、内存颗粒替换、内存块替换、内存行替换和内存存储单元格替换。In a possible implementation of the first aspect, a failure replacement information also includes a failure replacement type corresponding to a failure replacement operation, and the failure replacement type can be any of the following: memory module replacement, memory surface replacement, Memory particle replacement, memory block replacement, memory row replacement and memory storage cell replacement.
在第一方面的一种可能实现方式中,存在故障的第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。其中,内存存储单元格为内存中最小粒度的存储单元,一个内存行包括一行内存存储单元格,一个内存行包括多个内存存储单元格,一个内存块包括多个内存行,一个内存颗粒包括多个内存行,一个内存面包括多个内存颗粒,一个内存条包括一个或两个内存面。本实现方式中,内存中存储单元的粒度可以为内存存储单元格、内存行、内存块、内存颗粒、内存面或内存条种任一种,也即故障替换信息能够反映前述任一种粒度的故障替换操作,也即本方案支持任何一种粒度的故障替换操作,提高了本方案的实现灵活性。In a possible implementation of the first aspect, the granularity of the first storage unit with the fault is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars. Among them, the memory storage cell is the storage unit with the smallest granularity in the memory. A memory row includes a row of memory storage cells, a memory row includes multiple memory storage cells, a memory block includes multiple memory rows, and a memory particle includes multiple memory cells. A memory bank, a memory plane includes multiple memory particles, and a memory bank includes one or two memory planes. In this implementation, the granularity of the storage unit in the memory can be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect any of the foregoing granularities. Fault replacement operation, that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.
在第一方面的一种可能实现方式中,第一模块还包括至少一个第二存储单元,第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的故障存储单元的情况下,存储为内存存储单元格的第一存储单元中数据。进一步地,第一模块中配置的第二存储单元的粒度可以为内存存储单元格、内存行或其他粒度等,第一模块中第二存储单元的个数可以为32个、64个或128个。本申请实施例中,当至少一个第一存储单元中存在粒度为内存存储单元格的第一存储单元时,在对故障内存存储单元格进行故障处理的过程中,故障内存单元格中的数据会被写入第一模块中的备份存储单元中,由于在复位过程中,不向第一模块发送复位指令,保证了在完成复位操作后,备份存储单元中的数据不会被清除,从而保证了数据的完整性。In a possible implementation of the first aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is used to store a faulty storage unit when the at least one first storage unit is a memory storage unit. In the case of a cell, it is stored as the data in the first storage cell of the memory storage cell. Further, the granularity of the second storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of second storage units in the first module may be 32, 64, or 128. . In the embodiment of the present application, when there is a first storage unit whose granularity is a memory storage cell in at least one first storage unit, the data in the faulty memory storage cell will be It is written into the backup storage unit in the first module. Since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thus ensuring Data integrity.
在第一方面的一种可能实现方式中,系统包括内存控制器,第一模块集成于内存控制器中。则复位控制电路,具体用于向处理器内核发送复位指令,且不向内存控制器发送复位指令。本实现方式中,由于第一模块中记录的故障替换信息指示内存中的数据在存储单元中的分布情况,而内存控制器用于管理内存,将第一模块集成于内存控制器中,方便内存控制器对第一模块的管理,也方便内存控制器读取故障替换信息以管理内存;此外,直接控制整个内存控制器不进行复位,避免复位后出现内存控制器中不同模块之间不同步的问题。In a possible implementation of the first aspect, the system includes a memory controller, and the first module is integrated in the memory controller. Then the reset control circuit is specifically used to send a reset instruction to the processor core, and does not send a reset instruction to the memory controller. In this implementation, because the failure replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory, the first module is integrated into the memory controller to facilitate memory control The management of the first module by the memory controller also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, avoiding the problem of non-synchronization between different modules in the memory controller after reset .
在第一方面的一种可能实现方式中,复位控制电路,还用于在获取到冷复位信号的情况下,向处理器内核和第一模块发送复位指令。其中,冷复位信号用于触发冷复位操作,冷复位操作指的是需要将整个复位系统以及内存恢复到首次上电状态,一般可以通过上下电来进行。本实现方式中,由于在复位控制电路获取到的是冷复位信号的情况下,证明触发复位操作的原因是内存发生了故障,则这个时候需要对内存进行复位,也即内存中的数据会被清除,从而不再有保证内存中数据不丢失的需求,将第一模块也执行复位操作,从而在完成复位操作后,能够重新向第一模块中写入新的故障替换信息,保证整个复位系统处于同步状态。In a possible implementation of the first aspect, the reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired. Among them, the cold reset signal is used to trigger the cold reset operation. The cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off. In this implementation, since the reset control circuit obtains the cold reset signal, it is proved that the reset operation is triggered because the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be Clear, so that there is no need to ensure that the data in the memory is not lost. The first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset system In sync.
在第一方面的一种可能实现方式中,复位控制电路中可以包括一个逻辑电路,当复位控制电路获取到的是热复位信号的时候,复位控制电路的输出端不与第一模块耦合;当复位控制电路获取到的是冷复位信号的时候,复位控制电路的输出端与第一模块耦合。In a possible implementation of the first aspect, the reset control circuit may include a logic circuit. When the reset control circuit obtains the hot reset signal, the output terminal of the reset control circuit is not coupled with the first module; When the reset control circuit obtains the cold reset signal, the output terminal of the reset control circuit is coupled with the first module.
在第一方面的一种可能实现方式中,复位控制电路,还用于向第一模块发送第一指令,第一指令指示第一模块不执行复位操作。In a possible implementation of the first aspect, the reset control circuit is further configured to send a first instruction to the first module, and the first instruction instructs the first module not to perform a reset operation.
第二方面,本申请提供了一种数据处理系统,可用于管理内存数据的领域中。数据处理系统包括处理器内核和第一模块,第一模块包括第一寄存器,第一寄存器用于存储故障替换信息,故障替换信息中包括第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故障的存储单元。处理器内核,用于从第一寄存器中获取故障替换信息,并将故障替换信息写入非易失性存储介质中,以使在处理器内核和第一模块进行复位操作时,故障替换信息不丢失。本实现方式中,提出故障替换信息这一新概念,并在复位系统中增设专门用来存储故障替换信息的第一寄存器,在内存控制器向第一模块中写入故障替换信息后,处理器内核就将新生成的故障替换信息写入非易失性存储介质中,从而数据处理系统的复位不会导致故障替换信息的丢失,即使由于对内存中的故障存储单元进行故障替换处理导致内存中的部分存储单元被隔离替换,在系统复位后,能够根据前述故障替换信息了解内存中哪些存储单元为被隔离的故障存储单元,以避免由于访问被隔离的故障存储单元而导致的系统宕机,也即能够实现正确访问内存,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。In the second aspect, this application provides a data processing system that can be used in the field of managing memory data. The data processing system includes a processor core and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The storage unit in the internal memory is faulty when it is replaced. The processor core is used to obtain the fault replacement information from the first register and write the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not available when the processor core and the first module perform a reset operation. Lost. In this implementation, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system. After the memory controller writes the fault replacement information to the first module, the processor The kernel writes the newly generated fault replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the fault replacement information, even if the faulty storage unit in the memory is replaced by a fault in the memory. Part of the storage units are isolated and replaced. After the system is reset, it is possible to know which storage units in the memory are isolated faulty storage units based on the aforementioned fault replacement information, so as to avoid system downtime due to access to the isolated faulty storage unit. That is to say, the memory can be accessed correctly, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
在第二方面的一种可能实现方式中,故障替换信息中还包括第二存储单元的位置信息,第二存储单元为在对内存中的存储单元进行故障替换时的备份存储单元。In a possible implementation manner of the second aspect, the fault replacement information further includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced.
在第二方面的一种可能实现方式中,故障存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。In a possible implementation of the second aspect, the granularity of the faulty storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
在第二方面的一种可能实现方式中,系统包括内存控制器,第一模块集成于内存控制器中。In a possible implementation of the second aspect, the system includes a memory controller, and the first module is integrated in the memory controller.
在第二方面的一种可能实现方式中,处理器内核,还用于在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合,在第一寄存器进行复位过程中,将故障替换信息集合回填至第一寄存器。其中,故障替换信息集合包括至少一个故障替换信息。本实现方式中,处理器内核从非易失性存储介质中获取故障替换信息集合,并在第一模块进行复位过程中,直接将故障替换信息回填至第一模块,以实现数据处理系统复位后,内存控制器直接利用第一模块中的故障替换信息准确访问内存,操作简单,易于实现。In a possible implementation of the second aspect, the processor core is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and reset in the first register In the process, the fault replacement information set is backfilled to the first register. Wherein, the failure replacement information set includes at least one failure replacement information. In this implementation, the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system. , The memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.
在第二方面的一种可能实现方式中,第一模块还包括至少一个第二存储单元,第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储为内存存储单元格的第一存储单元中的第一数据。处理器内核,还用于从第一模块中的第二存储单元中获取第一数据,并将第一数据写入非易失性存储介质中,以使在处理器内核和第一模块进行复位操作时,第一数据不丢失。处理器内核,还用于在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合和第一数据,在第一模块进行复位过程中,将故障替换信息集合回填至第一寄存器,并将第一数据回填至第一模块中的第二存储单元,其中,故障替换信息集合包括至少一个故障替换信息。本实现方式中,还将第一模块中的第二存储单元中存储的第一数据写入非易失性存储介质中,在对第一模块进行复位操作时,将第一数据回填至第一模块中,以保证第一数据不被丢失,从而保证了数据的完整性。In a possible implementation of the second aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is used when the at least one first storage unit is a memory storage unit , Is stored as the first data in the first storage unit of the memory storage unit. The processor core is also used to obtain the first data from the second storage unit in the first module, and write the first data into the non-volatile storage medium, so that the processor core and the first module are reset During operation, the first data is not lost. The processor core is also used to obtain the fault replacement information collection and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, and collect the fault replacement information during the reset process of the first module Backfill to the first register, and backfill the first data to the second storage unit in the first module, wherein the failure replacement information set includes at least one failure replacement information. In this implementation, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.
在第二方面的一种可能实现方式中,处理器内核,还用于对第一模块执行复位操作,以初始化第一模块,在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合,并根据故障替换信息集合,对内存的存储单元中的数据执行逆替换操作。其中,故障替换信息集合中包括至少一个故障替换信息,逆替换操作用于将第二存储单元中的数据重新写入第一存储单元中,以使内存中数据在存储单元中的分布情况还原至初始状态。进一步地,将内存中数据在存储单元中的分布情况还原至初始状态并不是指将内存中的数据清除,而是指将内存中的数据按照执行过故障替换技术之前的存储模式进行存储。In a possible implementation of the second aspect, the processor core is also used to perform a reset operation on the first module to initialize the first module, and when the reset operation is a warm reset operation, from the non-volatile storage The failure replacement information collection is obtained from the medium, and the reverse replacement operation is performed on the data in the storage unit of the memory according to the failure replacement information collection. Wherein, the failure replacement information set includes at least one failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to The initial state. Further, restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented.
本实现方式中,由于处理器内核故障或内存控制器故障也有可能会导致内存中的某个存储单元满足故障替换条件,也即在对处理器内核和内存控制器进行复位操作后,内存中满足故障替换条件的存储单元可能会再次变为可使用的存储单元,所以在对处理器内核和内存控制器进行复位操作后,对内存的存储单元中的数据执行逆替换操作,也即释放了备份存储单元,有利于延长内存的使用寿命。In this implementation, due to a processor core failure or a memory controller failure, a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory meets the The storage unit under the failure replacement condition may become a usable storage unit again, so after resetting the processor core and the memory controller, perform reverse replacement operation on the data in the memory storage unit, that is, release the backup The storage unit helps to extend the service life of the memory.
在第二方面的一种可能实现方式中,处理器内核,还用于在复位操作为冷复位操作的情况下,不从非易失性存储介质中获取故障替换信息集合。In a possible implementation manner of the second aspect, the processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
在第二方面的一种可能实现方式中,处理器内核,还用于在复位操作为冷复位操作的情况下,不从非易失性存储介质中获取故障替换信息集合和第一数据。In a possible implementation of the second aspect, the processor core is further configured to not obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a cold reset operation.
对于本申请第二方面以及第二方面的部分可能实现方式中名词的概念、具体实现步骤以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中 的描述,此处不再一一赘述。For the concept of nouns in the second aspect of this application and some possible implementations of the second aspect, the specific implementation steps, and the beneficial effects brought by each possible implementation, you can refer to the various possible implementations in the first aspect The description of each is not repeated here.
第三方面,本申请提供了一种复位方法,可用于管理内存数据的领域中。方法应用于复位系统中,系统包括复位控制电路、处理器内核和第一模块,第一模块包括第一寄存器,第一寄存器用于存储故障替换信息,故障替换信息包括第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故障的存储单元。复位控制电路获取热复位信号;复位控制电路响应获取到的热复位信号,向第二模块发送复位指令,第二模块包括处理器内核,且不包括第一模块,复位指令用于触发执行复位操作。In the third aspect, this application provides a reset method that can be used in the field of managing memory data. The method is applied to a reset system. The system includes a reset control circuit, a processor core, and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. , The first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The reset control circuit obtains the hot reset signal; the reset control circuit responds to the obtained hot reset signal and sends a reset instruction to the second module. The second module includes the processor core and does not include the first module. The reset instruction is used to trigger the reset operation .
本申请第三方面还用于执行第一方面各种实现方式中的步骤,对于本申请第三方面以及第三方面的各种可能实现方式的具体实现步骤,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。The third aspect of this application is also used to execute the steps in the various implementations of the first aspect, the specific implementation steps of the third aspect of the application and the various possible implementations of the third aspect, and the implementation of each possible implementation. For the beneficial effects of the above, reference may be made to the descriptions in the various possible implementation manners in the first aspect, which will not be repeated here.
第四方面,本申请提供了一种数据处理方法,可用于管理内存数据的领域中。方法应用于数据处理系统中,数据处理系统包括处理器内核和第一模块,第一模块包括第一寄存器,第一寄存器用于存储故障替换信息,故障替换信息包括第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故障的存储单元。处理器内核从第一寄存器中获取故障替换信息;处理器内核将故障替换信息写入非易失性存储介质中,以使在处理器内核和第一模块进行复位操作时,故障替换信息不丢失。In the fourth aspect, this application provides a data processing method that can be used in the field of managing memory data. The method is applied to a data processing system. The data processing system includes a processor core and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The processor core obtains the fault replacement information from the first register; the processor core writes the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not lost when the processor core and the first module perform a reset operation .
本申请第四方面还用于执行第二方面各种实现方式中的步骤,对于本申请第四方面以及第四方面的各种可能实现方式的具体实现步骤,以及每种可能实现方式所带来的有益效果,均可以参考第二方面中各种可能的实现方式中的描述,此处不再一一赘述。The fourth aspect of the application is also used to execute the steps in the various implementations of the second aspect, the specific implementation steps of the fourth aspect and the various possible implementations of the fourth aspect of the application, and the implementation of each possible implementation. For the beneficial effects of, refer to the descriptions in the various possible implementation manners in the second aspect, which will not be repeated here.
第五方面,本申请提供了一种计算机设备,所述计算机设备中配置上述第一方面所述的复位系统,或者,配置有上述第二方面所述的数据处理系统。In a fifth aspect, the present application provides a computer device configured with the reset system described in the first aspect above, or configured with the data processing system described in the second aspect above.
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存服务器或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a sixth aspect, the present application provides a chip system including a processor for supporting the realization of the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above methods. In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the server or the communication device. The chip system can be composed of chips, and can also include chips and other discrete devices.
附图说明Description of the drawings
图1为本申请实施例提供的复位系统的一种结构示意图;FIG. 1 is a schematic structural diagram of a reset system provided by an embodiment of this application;
图2为本申请实施例提供的复位系统的一种工作流程示意图;FIG. 2 is a schematic diagram of a work flow of the reset system provided by an embodiment of the application;
图3为本申请实施例提供的复位方法中故障替换技术的一种示意图;FIG. 3 is a schematic diagram of a fault replacement technique in the reset method provided by an embodiment of the application;
图4为本申请实施例提供的复位系统的一种系统示意图;FIG. 4 is a system schematic diagram of a reset system provided by an embodiment of this application;
图5为本申请实施例提供的数据处理系统的一种工作流程示意图;FIG. 5 is a schematic diagram of a workflow of a data processing system provided by an embodiment of the application;
图6为本申请实施例提供的数据处理方法中逆替换操作的一种示意图;FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application;
图7为本申请实施例提供的复位系统的一种系统示意图;FIG. 7 is a system schematic diagram of a reset system provided by an embodiment of the application;
图8为本申请实施例提供的复位系统的另一种系统示意图;FIG. 8 is a schematic diagram of another system of the reset system provided by an embodiment of the application;
图9为本申请实施例提供的数据处理系统的一种系统示意图;FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of this application;
图10为本申请实施例提供的数据处理系统的另一种系统示意图;FIG. 10 is a schematic diagram of another system of the data processing system provided by an embodiment of the application;
图11为本申请实施提供的计算机设备的一种结构示意图。FIG. 11 is a schematic diagram of a structure of a computer device provided by the implementation of this application.
具体实施方式detailed description
本申请实施例提供了一种复位系统、数据处理系统以及相关设备,提出故障替换信息这一新概念,并增设有专门记录故障替换信息的第一寄存器,在复位过程中保证故障替换信息不丢失,从而在完成复位操作之后,能够根据故障替换信息,正确访问内存中的数据,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。The embodiment of the application provides a reset system, a data processing system, and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process Therefore, after the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
本申请的说明书和权利要求书及上述附图中的术语“第一”、第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", second, etc. in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way It can be interchanged under appropriate circumstances. This is only the way of distinguishing objects with the same attribute in the description of the embodiments of this application. In addition, the terms "including" and "having" and any variations of them are intended to be Covering non-exclusive inclusion, so that the process, method, system, product or equipment containing a series of units is not necessarily limited to those units, but may include other units that are not clearly listed or are inherent to these processes, methods, products or equipment .
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
本申请实施例提供的复位系统主要应用于对内存数据进行处理的设备中。为了便于理解本方案,本申请实施例中首先结合图1对本申请实施例提供的复位系统进行介绍,请先参阅图1,图1为本申请实施例提供的复位系统的一种结构示意图,图1也可以视为是本申请实施例提供的数据处理系统的一种结构示意图。复位系统中包括处理器和内存,该处理器和内存可以被配置于任意形态的电子设备中。处理器中集成有处理器内核(core)、复位控制电路、内存控制器(double data rate sdram controller,DDRC)和高速物理接口收发器(high-speed physical layer,HSPHY)。The reset system provided by the embodiment of the present application is mainly applied to a device that processes memory data. In order to facilitate the understanding of this solution, the embodiment of the application first introduces the reset system provided by the embodiment of the application with reference to FIG. 1. Please refer to FIG. 1 first. FIG. 1 can also be regarded as a schematic structural diagram of the data processing system provided by the embodiment of the present application. The reset system includes a processor and a memory, and the processor and the memory can be configured in any form of electronic equipment. The processor integrates a processor core (core), a reset control circuit, a memory controller (double data rate sdram controller, DDRC), and a high-speed physical interface transceiver (high-speed physical layer, HSPHY).
其中,处理器内核上可以搭载软件系统,用于提供操作系统的基本功能。复位控制电路用于触发处理器中的模块或单元执行复位操作,还用于触发内存执行复位操作。Among them, a software system can be mounted on the processor core to provide basic functions of the operating system. The reset control circuit is used to trigger a module or unit in the processor to perform a reset operation, and is also used to trigger a memory to perform a reset operation.
内存控制器用于实现将处理器内核发出的访问请求中地址到内存中的物理地址之间的转换,并将访问请求传输至HSPHY,还用于对处理器内核发出的访问请求进行高效率调度,还用于对内存中的存储单元进行故障替换操作。The memory controller is used to convert the address in the access request issued by the processor core to the physical address in the memory, and transfer the access request to HSPHY. It is also used to efficiently schedule the access request issued by the processor core. It is also used to perform fault replacement operations on the storage unit in the memory.
HSPHY与处理器之外的内存通信连接,用于获取内存控制器生成的数字信号,并将数字信号转换为电信号后传输给内存;还用于获取内存生成的电信号,并将电信号转换为数字信号后传输给内存控制器。HSPHY communicates with the memory outside the processor, and is used to obtain the digital signal generated by the memory controller, and convert the digital signal into an electrical signal, and then transmit it to the memory; it is also used to obtain the electrical signal generated by the memory and convert the electrical signal It is a digital signal and then transmitted to the memory controller.
需要说明的是,在实际应用产品中,处理器中还可以包括更多或更少的模块或单元,此外,内存控制器也可以不集成于处理器中,也即内存控制器与处理器为两个独立的装置,图1仅为方便理解本方案的应用环境提成的一个示例,不用于限定本方案。It should be noted that in actual application products, the processor may also include more or fewer modules or units. In addition, the memory controller may not be integrated into the processor, that is, the memory controller and the processor are Two independent devices, Figure 1 is only an example to facilitate understanding of the application environment commission of this solution, and is not used to limit this solution.
基于上述描述。本申请实施例提供了一种复位系统,该复位系统中新增用于存储故障替换信息的第一模块,故障替换信息用于反映内存中数据的分布情况,从而只要保证在对复位系统进行复位后,该故障替换信息不丢失,在系统复位后,仍然能够根据前述故障替换信息,正确访问内存中的数据,以实现在使用内存中的故障替换技术和复位技术的前提 下,内存中数据的不丢失。具体的,在一种实现方式中,在复位系统进行复位的过程中,控制第一模块不进行复位操作,以免故障替换信息的丢失;在另一种实现方式中,将第一模块中的故障替换信息写入复位系统之外的非易失性存储介质中,从而复位系统的复位也不会导致故障替换信息的丢失,但前述两种情况的具体操作方式相差较大,如下将分别进行介绍。Based on the above description. The embodiment of the present application provides a reset system. The reset system adds a first module for storing fault replacement information. The fault replacement information is used to reflect the distribution of data in the memory, so as to ensure that the reset system is reset. After that, the fault replacement information is not lost. After the system is reset, the data in the memory can still be accessed correctly according to the aforementioned fault replacement information, so as to realize the restoration of the data in the memory under the premise of using the fault replacement technology and reset technology in the memory. Not lost. Specifically, in one implementation, during the resetting process of the reset system, the first module is controlled not to perform the reset operation, so as to avoid the loss of fault replacement information; in another implementation, the fault in the first module The replacement information is written into a non-volatile storage medium other than the reset system, so that the reset of the reset system will not cause the loss of fault replacement information, but the specific operation methods of the foregoing two cases are quite different, which will be introduced separately as follows .
一、不清除故障替换信息1. Failure replacement information is not cleared
本申请实施例中,请参阅图2,图2为本申请实施例提供的复位系统的一种工作流程示意图,本申请实施例提供的复位系统的工作流程可以包括:In the embodiment of the present application, please refer to FIG. 2. FIG. 2 is a schematic diagram of a work flow of the reset system provided in the embodiment of the present application. The work flow of the reset system provided in the embodiment of the present application may include:
201、处理器内核向内存控制器发送故障替换指令。201. The processor core sends a fault replacement instruction to the memory controller.
本申请实施例中,在复位系统的运行过程中,处理器内核能够实时获取到内存是否发生故障,当内存中某个内存单元格(cell)发生故障时,内存控制器根据指令纠错(error checking and correcting,ECC)算法,对内存单元格中的数据进行纠错,若能够纠错成功,则向处理器内核上报当前故障为可修正(correctable error,CE)错误,进而生成并记录与该CE错误对应的描述信息。In the embodiment of the present application, during the operation of the reset system, the processor core can obtain in real time whether the memory is faulty. When a memory cell in the memory fails, the memory controller performs error correction according to the instruction. Checking and correcting, ECC) algorithm, to correct the data in the memory cell, if the error can be corrected successfully, the current fault is reported to the processor core as a correctable error (CE) error, and then generated and recorded Descriptive information corresponding to the CE error.
其中,内存中可以包括一个或多个内存条,一个内存条可以包括一个或两个内存面(rank),一个内存面可以包括多个内存颗粒(device),一个内存颗粒可以包括多个内存块(bank),一个内存块中可以包括多个内存行(row),一个内存行中可以包括多个内存存储单元格(cell),需要说明的是,前述为从内存中存储空间大小的角度对内存进行划分,在实际情况中,还可以从其他角度对内存进行划分。Among them, the memory may include one or more memory modules, a memory module may include one or two memory planes (rank), a memory plane may include multiple memory particles (devices), and a memory particle may include multiple memory blocks. (bank), a memory block can include multiple memory rows (row), and a memory row can include multiple memory storage cells (cell). It should be noted that the foregoing is a comparison of the size of the storage space in the memory. The memory is divided. In actual situations, the memory can also be divided from other angles.
该描述信息中至少包括发生CE错误的位置信息。发生CE错误的位置信息用于指示内存中发生CE错误的存储单元的位置,也即描述信息中的位置信息可以指示是内存中的哪一个存储单元发生了CE错误。本申请实施例中内存中存储单元具体可以指的是以下中的一项或多项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。可选地,该描述信息还可以包括CE错误类型。CE错误类型包括但不限于处理器内核访问内存时产生的CE错误、内存周期性巡检时产生的CE错误或其他类型的CE错误等,此处不做穷举。本申请实施例中,存储单元的粒度可以为内存存储单元格、内存行、内存块、内存颗粒、内存面或内存条种任一种,也即故障替换信息能够反映前述任一种粒度的故障替换操作,也即本方案支持任何一种粒度的故障替换操作,提高了本方案的实现灵活性。The description information includes at least the location information where the CE error occurs. The location information where the CE error occurred is used to indicate the location of the storage unit in the memory where the CE error occurred, that is, the location information in the description information can indicate which storage unit in the memory has the CE error. The in-memory storage unit in the embodiment of the present application may specifically refer to one or more of the following: a memory storage cell, a memory row, a memory block, a memory particle, a memory surface, and a memory bar. Optionally, the description information may also include the CE error type. The types of CE errors include, but are not limited to, CE errors generated when the processor core accesses the memory, CE errors generated during periodic memory inspections, or other types of CE errors, etc., which are not exhaustive here. In the embodiments of the present application, the granularity of the storage unit may be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect the faults of any of the foregoing granularities. Replacement operation, that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.
处理器内核在获取到与该CE错误对应的描述信息之后,可以确定内存中发生CE错误的具体位置,进而判断内存中的存储单元是否满足故障替换条件。处理器内核在确定满足故障替换条件的情况下,向内存控制器发送故障替换指令,在不满足故障替换条件的情况下,可以继续对内存进行监控。After the processor core obtains the description information corresponding to the CE error, it can determine the specific location in the memory where the CE error occurs, and then determine whether the storage unit in the memory meets the failure replacement condition. When the processor core determines that the fault replacement condition is met, it sends a fault replacement instruction to the memory controller, and when the fault replacement condition is not met, it can continue to monitor the memory.
其中,故障替换可以根据粒度的不同划分为多种类型:内存条替换、内存面替换、内存颗粒替换、内存块替换、内存行替换和内存存储单元格替换。Among them, fault replacement can be divided into multiple types according to different granularities: memory module replacement, memory surface replacement, memory particle replacement, memory block replacement, memory row replacement, and memory storage cell replacement.
对应的,故障替换条件可以包括内存条故障替换条件、内存面故障替换条件、内存颗粒故障替换条件、内存块故障替换条件、内存行故障替换条件和内存存储单元格故障替换条件。进一步地,内存条故障替换条件具体可以为同一内存条发生CE错误的次数大于或 等于第一预设阈值,或者,内存条故障替换条件具体可以为同一内存条发生同一类型的CE错误的次数大于或等于第二预设阈值等,内存条故障替换条件还可以为其他条件,第一预设阈值和第二预设阈值的取值均可参阅实际情况灵活设定,此处均不做限定。内存面替换条件、内存颗粒替换替换条件具体、内存块故障替换条件、内存行故障替换条件以及内存存储单元格故障替换条件的含义与前述内存条故障替换条件的含义类似,可参阅前述描述进行理解,此处不做赘述。Correspondingly, the failure replacement conditions may include memory module failure replacement conditions, memory plane failure replacement conditions, memory particle failure replacement conditions, memory block failure replacement conditions, memory row failure replacement conditions, and memory storage cell failure replacement conditions. Further, the memory module failure replacement condition may specifically be that the number of CE errors occurring in the same memory module is greater than or equal to the first preset threshold, or the memory module failure replacement condition may specifically be that the same memory module has the same type of CE error occurring more than the number of times. Or equal to the second preset threshold, etc., the memory module failure replacement condition may also be other conditions. The values of the first preset threshold and the second preset threshold can be flexibly set with reference to the actual situation, and are not limited here. The meanings of the memory surface replacement conditions, the specific memory particle replacement conditions, the memory block failure replacement conditions, the memory row failure replacement conditions, and the memory storage cell failure replacement conditions are similar to the meanings of the aforementioned memory module failure replacement conditions. You can refer to the foregoing description for understanding , Do not repeat it here.
故障替换指令中至少携带有故障存储单元的位置信息和备份存储单元的位置信息。故障存储单元的位置信息具体可以表现为字符串,前述字符串为被替换的存储单元的编码;备份存储单元的位置信息具体也可以表现为字符串,前述字符串为替换后的存储单元的编码。可选地,故障替换指令中还可以携带有故障替换类型,前述故障替换类型具体也可以表现为字符编码,作为示例,例如00代表故障替换类型为内存块替换,01代表故障替换类型为内存面替换等,此处不做穷举。The fault replacement instruction carries at least the location information of the faulty storage unit and the location information of the backup storage unit. The location information of the faulty storage unit can be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit; the location information of the backup storage unit can also be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit. . Optionally, the fault replacement instruction may also carry a fault replacement type. The aforementioned fault replacement type can also be expressed as a character code. As an example, for example, 00 represents the fault replacement type is memory block replacement, and 01 represents the fault replacement type is memory plane. Replacement, etc., will not be exhaustive here.
202、内存控制器根据接收到的故障替换指令,对内存中的存储单元进行故障替换处理,并将故障替换信息写入第一寄存器。202. The memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.
本申请实施例中,内存控制器在接收到的故障替换指令,可以根据故障存储单元的位置信息和备份存储单元的位置信息获知是内存中哪个存储单元需要被替换隔离,以及,替换后的备份存储单元的位置。进而对内存中的存储单元进行故障替换处理,在一次故障替换操作中内存控制器将故障存储单元中的数据读出并写入至备份存储单元中。其中,内存控制器可以集成于处理器中,也可以与处理器分别为独立的装置。可选地,在一次故障替换操作中,内存控制器还需要对内存中的存储单元进行数据重组。In the embodiment of the present application, when the memory controller receives the fault replacement instruction, it can know which storage unit in the memory needs to be replaced and isolated according to the location information of the faulty storage unit and the location information of the backup storage unit, and the backup after replacement. The location of the storage unit. Furthermore, failure replacement processing is performed on the storage unit in the memory. In a failure replacement operation, the memory controller reads out the data in the failed storage unit and writes it into the backup storage unit. Among them, the memory controller may be integrated in the processor, or may be a separate device from the processor. Optionally, in a failure replacement operation, the memory controller also needs to reorganize the storage unit in the memory.
需要说明的是,故障存储单元一定是位于内存中,但备份存储单元不一定都位于内存中。当故障存储单元的粒度为内存存储单元格时,用于存储故障内存存储单元格中数据的备份存储单元可以集成于第一模块中。也即当某个故障存储单元的粒度为内存存储单元格时,内存控制器会将故障内存存储单元格中数据写入第一模块中的备份存储单元中。当故障存储单元的粒度为内存行、内存块、内存颗粒、内存面或内存条时,对应的备份存储单元可以设置于内存中。It should be noted that the faulty storage unit must be located in the memory, but the backup storage unit is not necessarily located in the memory. When the granularity of the faulty storage unit is a memory storage cell, a backup storage unit for storing data in the faulty memory storage cell may be integrated in the first module. That is, when the granularity of a certain faulty storage unit is a memory storage cell, the memory controller writes the data in the faulty memory storage cell into the backup storage unit in the first module. When the granularity of the faulty storage unit is a memory row, a memory block, a memory particle, a memory surface, or a memory bar, the corresponding backup storage unit can be set in the memory.
为更为直观的理解故障替换技术的处理过程,请参阅图3,图3为本申请实施例提供的复位方法中故障替换技术的一种示意图。图3中以需要进行故障替换的存储单元为内存颗粒为例。图3中包括(a)、(b)和(c)三个子示意图,(a)子示意图代表进行故障替换操作前一个内存条中的数据分布情况,如(a)子示意图所示,一个内存条包括两个内存面(分别为Rank A和Rank B),每个内存面中包括18个内存颗粒,18个内存颗粒中包括16个用于正常存储数据的颗粒,还包括一个ECC颗粒和奇偶校验位颗粒,该ECC纠错颗粒也可以视为备份颗粒,在颗粒0至颗粒15中出现满足故障替换条件的故障颗粒时,将故障颗粒中的数据写入到ECC纠错颗粒中,以实现对故障颗粒的替换隔离。(b)子示意图指示Rank A中的颗粒1为故障颗粒,需要将Rank A中颗粒1的数据写入到Rank A的ECC纠错颗粒中,但这样Rank A就丧失了纠错能力,为了使Rank A在进行了颗粒替换后仍然具备纠错能力,可以把Rank A和Rank B进行数据重组,以使Rank A和Rank B共享一个Rank  B的ECC纠错颗粒,也即由(a)子示意图中的两个16+2的存储模式变成了(c)子示意图中的32+3的模式,应理解,图3中的示例仅为方便理解故障替换技术,不用于限定本方案。In order to understand the processing process of the fault replacement technology more intuitively, please refer to FIG. 3, which is a schematic diagram of the fault replacement technology in the reset method provided by the embodiment of the application. In Figure 3, the storage unit that needs to be replaced with failure is a memory particle as an example. Figure 3 includes three sub-schematics (a), (b) and (c). The sub-schematic diagram (a) represents the data distribution in the memory bank before the failure replacement operation. As shown in the sub-schematic diagram (a), a memory The bar includes two memory planes (Rank A and Rank B respectively). Each memory plane includes 18 memory particles. The 18 memory particles include 16 particles for normal data storage, as well as an ECC particle and parity. Check bit particles. The ECC error correction particles can also be regarded as backup particles. When a fault particle that meets the fault replacement condition occurs in particles 0 to 15, the data in the fault particle is written into the ECC error correction particle to Realize the replacement and isolation of faulty particles. (b) The sub-schematic diagram indicates that particle 1 in Rank A is a faulty particle, and the data of particle 1 in Rank A needs to be written into the ECC error correction particle of Rank A, but in this way, Rank A loses its error correction ability. Rank A still has the ability to correct errors after replacing the particles. Rank A and Rank B can be reorganized so that Rank A and Rank B share an ECC error correction particle of Rank B, which is represented by the sub-schematic diagram (a) The two 16+2 storage modes in (c) have become the 32+3 mode in the sub-schematic diagram in (c). It should be understood that the example in FIG. 3 is only to facilitate understanding of the failure replacement technology, and is not used to limit the solution.
在对内存中的存储单元进行过故障替换处理之后,内存控制器将故障替换信息写入第一寄存器中。其中,在复位系统的运行过程中可以有多次故障替换操作,一个故障替换信息用于记录一次故障替换操作。故障替换信息中包括第一存储单元的位置信息和第二存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在CE错误的存储单元(也即在进行故障替换后被替换掉的存储单元),第二存储单元为在对内存中的存储单元进行故障替换时的备份存储单元(也即在进行故障替换后使用的存储单元)。从而故障替换信息能够反映在对内存中发生故障的存储单元进行过故障替换处理后,内存中的数据在存储单元中的分布情况。After performing fault replacement processing on the storage unit in the memory, the memory controller writes the fault replacement information into the first register. Among them, there may be multiple failure replacement operations during the operation of the reset system, and one failure replacement information is used to record one failure replacement operation. The failure replacement information includes the location information of the first storage unit and the location information of the second storage unit. The first storage unit is the storage unit that has a CE error when the storage unit in the memory is replaced with a failure (that is, when the failure is replaced The storage unit replaced later), and the second storage unit is a backup storage unit when the storage unit in the memory is replaced by a failure (that is, the storage unit used after the failure is replaced). Therefore, the failure replacement information can reflect the distribution of the data in the memory in the storage unit after the failure replacement processing is performed on the storage unit that has failed in the memory.
可选地,一个故障替换信息中还可以包括故障存储单元的粒度级别、被替换的故障存储单元的CE错误类型或其他类型的信息等。本申请实施例中,故障替换信息中至少包括被替换掉的存储单元的位置信息和替换后的存储单元的位置信息,也即通过故障替换信息记录了内存中发生的故障替换操作,不仅能够反映当前内存中哪些存储单元由于故障被替换隔离,也反映了进行故障替换后的数据存储于哪个存储单元中,从而很直观的反映出内存中的数据在存储单元中的分布情况。Optionally, a piece of fault replacement information may also include the granularity level of the faulty storage unit, the CE error type of the replaced faulty storage unit, or other types of information, etc. In the embodiment of the present application, the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect Which storage units in the current memory are replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.
为进一步理解故障替换信息的概念,以下以进行故障替换的存储单元的粒度级为内存块为例,通过表1对故障替换信息进行进一步介绍。In order to further understand the concept of fault replacement information, the following takes the granularity level of the storage unit for fault replacement as a memory block as an example, and further introduces the fault replacement information through Table 1.
表1Table 1
Figure PCTCN2021102029-appb-000001
Figure PCTCN2021102029-appb-000001
其中,表1中的region0指的是上述故障替换信息所对应的故障替换操作是发生在内存中编号为0的区域(region)中。region0-enable是故障替换信息中用于指示region0中是否进行过故障替换的字段,region0-enable中的编码0指示region0中进行过故障替换。region0-size是故障替换信息中用于指示region0中进行故障替换的存储单元的粒度的字段,region0-size中编码00指示region0中进行故障替换的存储单元的粒度为内存块(bank)。region0-rank指示region0中需要进行故障替换的内存块所在的内存面的编号,region0-device指示region0中需要进行故障替换的内存块所在的内存颗粒的编号,region0-bank指示region0中需要进行故障替换的内存块的编号,region0-rank、region0-device和region0-bank 共同指示第一存储单元的位置,如表1所示,第一存储单元为region0中编号为10的内存面中编号为5的内存颗粒中编号为14的内存块。region0-buddy-rank指示备份内存块所在的内存面的编号,region0-buddy-device指示备份内存块所在的内存颗粒的编号,region0-buddy-bank指示region0中需要进行故障替换的内存块的编号,region0-buddy-rank、region0-buddy-device和region0-buddy-bank共同指示第二存储单元的位置,如表1所示,第二存储单元为region0中编号为18的内存面中编号为13的内存颗粒中编号为22的内存块。应理解,在实际情况中,故障替换信息可以包括更多或更少的信息,表1中的示例仅为方便理解故障替换信息的概念,不用于限定本方案。Among them, region 0 in Table 1 refers to that the failure replacement operation corresponding to the failure replacement information occurred in the region numbered 0 in the memory (region). region0-enable is a field in the fault replacement information used to indicate whether a fault replacement has been performed in region0, and the code 0 in region0-enable indicates that a fault replacement has been performed in region0. The region0-size is a field in the failure replacement information used to indicate the granularity of the storage unit to be replaced in region0. The code 00 in the region0-size indicates that the granularity of the storage unit to be replaced in the region0 is a bank. region0-rank indicates the number of the memory plane where the memory block that needs to be replaced in region0 is located, region0-device indicates the number of the memory particle where the memory block that needs to be replaced in region0 is located, and region0-bank indicates the number of the memory block that needs to be replaced in region0. The number of the memory block, region0-rank, region0-device, and region0-bank collectively indicate the location of the first storage unit, as shown in Table 1, the first storage unit is the number 5 in the memory plane numbered 10 in region0 The memory block numbered 14 in the memory granule. region0-buddy-rank indicates the number of the memory plane where the backup memory block is located, region0-buddy-device indicates the number of the memory particle where the backup memory block is located, and region0-buddy-bank indicates the number of the memory block in region0 that needs to be replaced by failure. region0-buddy-rank, region0-buddy-device, and region0-buddy-bank collectively indicate the location of the second storage unit, as shown in Table 1, the second storage unit is the number 13 in the memory plane numbered 18 in region0 The memory block numbered 22 in the memory granule. It should be understood that, in actual situations, the fault replacement information may include more or less information, and the examples in Table 1 are only to facilitate understanding of the concept of fault replacement information, and are not used to limit the solution.
第一寄存器归属于第一模块,第一寄存器中存储有故障替换信息。第一寄存器具体可以表现为状态寄存器、配置寄存器或其他类型的寄存器等,此处不做限定。第一模块可以集成于内存控制器中。进一步地,一个第一寄存器中存储有一个故障替换信息,第一模块中可以配置有多个第一寄存器,以记录多个故障替换信息。The first register belongs to the first module, and fault replacement information is stored in the first register. The first register may specifically be represented as a status register, a configuration register, or other types of registers, etc., which is not limited here. The first module can be integrated in the memory controller. Further, one first register stores one fault replacement information, and the first module can be configured with multiple first registers to record multiple fault replacement information.
可选地,第一模块中还可以配置有至少一个第二存储单元,第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的故障存储单元的情况下,存储为内存存储单元格的第一存储单元中的数据。本申请实施例中,由于在复位过程中,不向第一模块发送复位指令,保证了在完成复位操作后,备份存储单元中的数据也不会被清除,从而保证了数据的完整性。Optionally, the first module may also be configured with at least one second storage unit, and the second storage unit in the first module is used for the case where the at least one first storage unit is a faulty storage unit of the memory storage unit, Store the data in the first storage unit as the memory storage unit. In the embodiment of the present application, since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thereby ensuring the integrity of the data.
进一步地,第一模块中配置的备份存储单元的粒度可以为内存存储单元格、内存行或其他粒度等,第一模块中备份存储单元的个数可以为32个、64个、128个或其他数量等。Further, the granularity of the backup storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of backup storage units in the first module may be 32, 64, 128, or others. Quantity etc.
可选地,复位系统中还可以配置有第二寄存器,第二寄存器用于记录内存控制器在进行故障替换操作的状态,前述状态可以包括未发生故障替换操作、故障替换操作进行中、故障替换操作成功、故障替换操作失败或其他类型的状态等,此处不做限定。进一步地,第二寄存器也可以集成于内存控制器中。进一步地,复位系统中可以配置有一组或多组寄存器,每组寄存器中包括一个第一寄存器和一个第二寄存器。Optionally, a second register may also be configured in the reset system. The second register is used to record the state of the memory controller in the fault replacement operation. The aforementioned status may include no fault replacement operation, fault replacement operation in progress, and fault replacement. Operation success, failure replacement operation failure, or other types of status, etc., are not limited here. Further, the second register can also be integrated in the memory controller. Further, one or more sets of registers may be configured in the reset system, and each set of registers includes a first register and a second register.
203、复位控制电路获取复位信号。203. The reset control circuit obtains the reset signal.
本申请实施例中,在复位系统的运行过程中可能会需要进行复位操作,从而复位控制电路能够获取到复位信号。其中,复位指的是将被复位的模块/单元/装置的状态恢复为首次上电的状态。复位控制电路可以集成于处理器内。可选地,复位控制电路在接收到复位信号后,可以判断接收到的是热复位信号还是冷复位信号。其中,冷复位信号一般是由内存故障引起的,用于触发冷复位操作,冷复位操作指的是需要将整个复位系统以及内存恢复到首次上电状态,一般可以通过上下电来进行。热复位信号一般是由非内存故障引起的,用于触发热复位操作,热复位操作指的是在复位系统复位的过程中对部分模块/单元/装置不进行复位。In the embodiment of the present application, a reset operation may be required during the operation of the reset system, so that the reset control circuit can obtain the reset signal. Among them, reset refers to restoring the state of the reset module/unit/device to the state of power-on for the first time. The reset control circuit can be integrated in the processor. Optionally, after receiving the reset signal, the reset control circuit can determine whether the received reset signal is a warm reset signal or a cold reset signal. Among them, the cold reset signal is generally caused by a memory failure and is used to trigger a cold reset operation. The cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off. The warm reset signal is generally caused by a non-memory fault and is used to trigger a warm reset operation. The warm reset operation refers to not resetting some modules/units/devices during the resetting process of the system.
具体的,在一种实现方式中,复位控制电路中可以包括第一管脚和第二管脚,若复位控制电路为从第一管脚处获取到的复位信号,则复位控制电路获取到的为冷复位信号;若复位控制电路为从第二管脚处获取到的复位信号,则复位控制电路获取到的为热复位信号。本实现方式中复位信号可以表现为一组低电平信号,前述一组低电平信号中可以包括一个 或多个低电平信号。在另一种实现方式中,复位控制电路从同一信号源处分别获取冷复位信号和热复位信号,冷复位信号和热复位信号具体表现为不同的电信号,作为示例,例如冷复位信号表现为01信号,或者0101信号,或者0011信号,热复位信号具体表现为10信号,或者1010,或者1100等,“0”指的是低电平信号,“1”指的是高电平信号。从而复位控制电路可以根据接收到的电信号的形式来确定是冷复位信号还是热复位信号,应理解,此处对冷复位信号和热复位信号的举例仅为方便理解本方案,不用于限定本方案。Specifically, in an implementation manner, the reset control circuit may include a first pin and a second pin. If the reset control circuit is a reset signal obtained from the first pin, the reset control circuit obtains It is a cold reset signal; if the reset control circuit is a reset signal obtained from the second pin, the reset control circuit obtains a hot reset signal. In this implementation manner, the reset signal may be represented as a group of low-level signals, and the aforementioned group of low-level signals may include one or more low-level signals. In another implementation, the reset control circuit obtains the cold reset signal and the warm reset signal from the same signal source, and the cold reset signal and the warm reset signal are specifically represented as different electrical signals. As an example, the cold reset signal is represented as 01 signal, or 0101 signal, or 0011 signal, the hot reset signal is specifically represented as 10 signal, or 1010, or 1100, etc., "0" refers to a low level signal, and "1" refers to a high level signal. Therefore, the reset control circuit can determine whether it is a cold reset signal or a warm reset signal according to the form of the received electrical signal. It should be understood that the examples of the cold reset signal and the warm reset signal here are only to facilitate the understanding of the solution, and are not used to limit the present solution. plan.
204、复位控制电路向第二模块发送复位指令,第二模块包括处理器内核,且不包括第一模块。204. The reset control circuit sends a reset instruction to the second module. The second module includes the processor core and does not include the first module.
本申请的一些实施例中,复位控制电路在获取到复位信号后,响应于获取到的复位信号,控制处理器内核执行复位操作,且控制第一模块不执行复位操作,也控制内存不执行复位操作。也即复位控制电路向第二模块发送复位指令,第二模块包括处理器内核,且不包括第一模块。可选地,第二模块还可以包括复位系统中除第一模块之外的其他模块,只要保证第一模块和内存不执行复位操作即可。需要说明的是,第二模块这个概念可以为人为划分的概念模块。In some embodiments of the present application, after obtaining the reset signal, the reset control circuit controls the processor core to perform the reset operation in response to the obtained reset signal, and controls the first module not to perform the reset operation, and also controls the memory not to perform the reset operate. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the first module. Optionally, the second module may also include other modules in the reset system except the first module, as long as it is ensured that the first module and the memory do not perform a reset operation. It should be noted that the concept of the second module can be an artificially divided conceptual module.
具体的,复位控制电路响应于获取到的复位信号,向处理器内核发送复位指令,不向第一模块和内存发送复位指令,复位指令用于触发执行复位操作,以实现控制处理器内核执行复位操作,且控制第一模块不执行复位操作,从而第一模块中存储的数据不被复位,也即第一模块中存储的数据不被清除。其中,复位指令可以为一组低电平信号,该一组低电平信号中包括至少一个低电平信号;复位指令也可以为同时包括低电平信号和高电平信号的一组电信号等,此处不做限定。Specifically, in response to the acquired reset signal, the reset control circuit sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory. The reset instruction is used to trigger the execution of the reset operation, so as to control the processor core to execute the reset. And control the first module not to perform a reset operation, so that the data stored in the first module is not reset, that is, the data stored in the first module is not cleared. Wherein, the reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal. Etc., it is not limited here.
进一步地,复位控制电路在获取到的复位信号为热复位信号的情况下,向处理器内核发送复位指令,不向第一模块和内存发送复位指令。在获取到冷复位信号的情况下,向处理器内核、第一模块和内存发送复位指令。也即只有在复位控制电路获取的为热复位信号的情况下,才会控制第一模块和内存不执行复位操作。Further, when the acquired reset signal is a hot reset signal, the reset control circuit sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory. When the cold reset signal is obtained, a reset instruction is sent to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.
可选地,复位控制电路还可以向第一模块和内存分别发送第一指令,第一指令指示不执行复位操作。从而处理器内核在接收到复位指令后执行复位操作,第一模块和内存在接收到第一指令后不执行复位操作,以实现控制处理器内核执行复位操作,且控制第一模块和内存不执行复位操作。Optionally, the reset control circuit may also send a first instruction to the first module and the memory respectively, and the first instruction instructs not to perform the reset operation. Therefore, the processor core performs the reset operation after receiving the reset instruction, and the first module and memory do not perform the reset operation after receiving the first instruction, so as to control the processor core to perform the reset operation and control the first module and the memory not to execute Reset operation.
具体的,针对复位控制电路向第一模块发送第一指令的实现方式。在一种情况下,复位指令和第一指令可以表现为两种不同的电信号,从而复位控制电路可以通过向处理器内核和第一模块发送不同电信号的方式,来实现向处理器内核发送复位指令,向第一模块发送第一指令。对应的,第一模块可以根据接收到的电信号的类型,来确定接收到的是不是第一指令。作为示例,例如复位指令为111000,第一指令为000111,“0”指的是低电平信号,“1”指的是高电平信号。在另一种情况下,第一模块中可以设置有第三管脚和第四管脚,若复位控制电路想要向第一模块发送复位指令,则复位控制电路向第三管脚发送指令;对应的,若第一模块为通过第三管脚获取到的指令,则视为获取到的为复位指令。若复位控制电路想要向第一模块发送第一指令,则复位控制电路向第四管脚发送指令;对应的, 若第一模块为通过第四管脚获取到的指令,则视为获取到的为第一指令。Specifically, the reset control circuit sends the first instruction to the first module. In one case, the reset instruction and the first instruction can be expressed as two different electrical signals, so the reset control circuit can send different electrical signals to the processor core and the first module to send to the processor core The reset instruction sends the first instruction to the first module. Correspondingly, the first module can determine whether the received first instruction is based on the type of the received electrical signal. As an example, for example, the reset command is 111000, the first command is 000111, "0" refers to a low-level signal, and "1" refers to a high-level signal. In another case, a third pin and a fourth pin may be provided in the first module. If the reset control circuit wants to send a reset command to the first module, the reset control circuit sends a command to the third pin; Correspondingly, if the first module is an instruction acquired through the third pin, it is deemed that the acquired instruction is a reset instruction. If the reset control circuit wants to send the first command to the first module, the reset control circuit sends the command to the fourth pin; correspondingly, if the first module is the command obtained through the fourth pin, it is deemed to be obtained Is the first instruction.
复位控制电路向内存发送第一指令的实现方式,与,复位控制电路向第一模块发送第一指令的实现方式类似,此处不再赘述。The implementation manner of the reset control circuit sending the first instruction to the memory is similar to the implementation manner of the reset control circuit sending the first instruction to the first module, and will not be repeated here.
进一步地,复位控制电路在获取到的复位信号为热复位信号的情况下,向处理器内核发送复位指令,向第一模块和内存发送第一指令。复位控制电路在获取到的复位信号为冷复位信号的情况下,向处理器内核、第一模块和内存发送复位指令。也即只有在复位控制电路获取的为热复位信号的情况下,才会控制第一模块和内存不执行复位操作。Further, the reset control circuit sends a reset instruction to the processor core and a first instruction to the first module and the memory when the acquired reset signal is a hot reset signal. When the acquired reset signal is a cold reset signal, the reset control circuit sends a reset instruction to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.
进一步可选地,若第一模块集成于内存控制器中,步骤204可以包括:复位控制电路在获取到复位信号后,响应于获取到的复位信号,向处理器内核发送复位指令,且不向内存控制器发送复位指令。也即复位控制电路向第二模块发送复位指令,第二模块包括处理器内核,且不包括内存控制器。复位控制电路控制处理器内核执行复位操作的具体实现方式,与上述描述相同,复位控制电路控制内存控制器不执行复位操作的具体实现方式,与上述描述类似,区别在于上述描述中的执行对象是第一模块,本实现方式中的执行对象是整个内存控制器,此处不做赘述。本申请实施例中,由于第一模块中记录的故障替换信息指示内存中的数据在存储单元中的分布情况,而内存控制器用于管理内存,将第一模块集成于内存控制器中,方便内存控制器对第一模块的管理,也方便内存控制器读取故障替换信息以管理内存;此外,直接控制整个内存控制器不进行复位,避免复位后出现内存控制器中不同模块之间不同步的问题。Further optionally, if the first module is integrated in the memory controller, step 204 may include: after obtaining the reset signal, the reset control circuit sends a reset instruction to the processor core in response to the obtained reset signal, and does not send a reset instruction to the processor core. The memory controller sends a reset command. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the memory controller. The specific implementation of the reset control circuit controlling the processor core to perform the reset operation is the same as the above description. The specific implementation of the reset control circuit controlling the memory controller not to perform the reset operation is similar to the above description, except that the execution object in the above description is The first module, the execution object in this implementation is the entire memory controller, which will not be repeated here. In the embodiment of the present application, since the failure replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory, the first module is integrated into the memory controller to facilitate the memory The controller's management of the first module also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, to avoid the occurrence of asynchronization between different modules in the memory controller after reset problem.
205、复位控制电路向处理器内核和第一模块发送复位指令。205. The reset control circuit sends a reset instruction to the processor core and the first module.
本申请的一些实施例中,复位控制电路确定获取到的复位信号为冷复位信号的情况下,向处理器内核、第一模块和内存均发送第一指令,以控制处理器内核、第一模块和内存均执行复位操作。可选地,复位控制电路还可以向复位系统中的其他模块发送复位指令。In some embodiments of the present application, when the reset control circuit determines that the acquired reset signal is a cold reset signal, the first instruction is sent to the processor core, the first module, and the memory to control the processor core and the first module. And the memory performs a reset operation. Optionally, the reset control circuit may also send a reset instruction to other modules in the reset system.
进一步地,复位控制电路中可以包括一个逻辑电路,当复位控制电路获取到的是热复位信号的时候,复位控制电路的输出端不与第一模块耦合;当复位控制电路获取到的是冷复位信号的时候,复位控制电路的输出端与第一模块耦合。Further, the reset control circuit may include a logic circuit. When the reset control circuit obtains a warm reset signal, the output terminal of the reset control circuit is not coupled with the first module; when the reset control circuit obtains a cold reset When the signal is applied, the output terminal of the reset control circuit is coupled with the first module.
本申请实施例中,由于在复位控制电路获取到的是冷复位信号的情况下,证明触发复位操作的原因是内存发生了故障,则这个时候需要对内存进行复位,也即内存中的数据会被清除,从而不再有保证内存中数据不丢失的需求,将第一模块也执行复位操作,从而在完成复位操作后,能够重新向第一模块中写入新的故障替换信息,保证整个复位系统处于同步状态。In the embodiment of this application, since the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be It is cleared so that there is no need to ensure that the data in the memory is not lost. The first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset The system is in sync.
可选地,第一模块集成于内存控制器中,复位控制电路确定获取到的复位信号为冷复位信号的情况下,复位控制电路控制处理器内核和内存控制器执行复位操作,且复位控制电路会控制内存执行复位操作。具体实现方式与上述类似,区别在于将上述描述中的第一模块替换为内存控制器,此处不做赘述。Optionally, the first module is integrated in the memory controller, and when the reset control circuit determines that the acquired reset signal is a cold reset signal, the reset control circuit controls the processor core and the memory controller to perform the reset operation, and the reset control circuit Will control the memory to perform a reset operation. The specific implementation is similar to the above, with the difference that the first module in the above description is replaced with a memory controller, which is not repeated here.
为进一步理解本方案,请参阅图4,图4为本申请实施例提供的复位系统的一种系统示意图。图4以第一模块集成于内存控制器中,内存控制器集成于处理器中为例。在复位控制电路获取到的是冷复位信号的情况下,复位控制电路向处理器内核、内存控制器、 HSPHY和内存发送复位指令,以触发整个复位系统和内存执行复位操作。在复位控制电路获取到的是热复位信号的情况下,复位控制电路向处理器内核发送复位指令,不向内存控制器、HSPHY和内存发送复位指令,以控制第一模块不执行复位操作,应理解,图4中的示例仅为更直观的理解本方案,不用于限定本方案。To further understand this solution, please refer to FIG. 4, which is a system schematic diagram of a reset system provided by an embodiment of this application. Figure 4 takes the first module integrated in the memory controller and the memory controller integrated in the processor as an example. In the case that the reset control circuit obtains a cold reset signal, the reset control circuit sends a reset instruction to the processor core, the memory controller, HSPHY and the memory to trigger the entire reset system and the memory to perform a reset operation. In the case that the reset control circuit obtains a hot reset signal, the reset control circuit sends a reset instruction to the processor core, and does not send a reset instruction to the memory controller, HSPHY and memory, so as to control the first module not to perform the reset operation. It is understood that the example in FIG. 4 is only for a more intuitive understanding of the solution, and is not used to limit the solution.
本申请实施例中,提出故障替换信息这一新概念,并在复位系统中增设专门用来存储故障替换信息的第一寄存器;在执行复位操作的过程中,控制第一模块不进行复位,从而在完成复位操作之后,第一寄存器中的故障替换信息能够不被复位,即使由于对内存中的故障存储单元进行故障替换处理导致内存中的部分存储单元被隔离替换,在系统复位后,能够根据前述故障替换信息了解内存中哪些存储单元为被隔离的故障存储单元,以避免由于访问被隔离的故障存储单元而导致的系统宕机,也即能够实现正确访问内存,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。In the embodiment of this application, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, thereby After the reset operation is completed, the fault replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to fault replacement processing of the fault storage unit in the memory, after the system is reset, it can be based on The aforementioned fault replacement information understands which storage units in the memory are isolated faulty storage units, so as to avoid system downtime caused by accessing the isolated faulty storage units, that is, to achieve correct access to the memory, so as to realize the use of the memory. Under the premise of fault replacement technology and reset technology, the data in the memory will not be lost.
二、备份故障替换信息Two, backup failure replacement information
本申请实施例中,请参阅图5,图5为本申请实施例提供的数据处理系统的一种工作流程示意图,本申请实施例提供的数据处理系统的工作流程可以包括:In the embodiment of this application, please refer to FIG. 5. FIG. 5 is a schematic diagram of a work flow of the data processing system provided in the embodiment of this application. The work flow of the data processing system provided in the embodiment of this application may include:
501、处理器内核向内存控制器发送故障替换指令。501. The processor core sends a fault replacement instruction to the memory controller.
502、内存控制器根据接收到的故障替换指令,对内存中的存储单元进行故障替换处理,并将故障替换信息写入第一寄存器。502. The memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.
本申请实施例中,步骤501和502的具体实现方式与图2对应实施例中步骤201和202的具体实现方式类似,可以参照上述描述,此处不做赘述。In the embodiment of the present application, the specific implementation manners of steps 501 and 502 are similar to the specific implementation manners of steps 201 and 202 in the embodiment corresponding to FIG.
503、处理器内核将故障替换信息写入非易失性存储介质。503. The processor core writes the fault replacement information into the non-volatile storage medium.
本申请的一些实施例中,在内存控制器将故障替换信息写入第一模块中的第一寄存器之后,处理器内核可以从第一模块中的第一寄存器中读取该故障替换信息,并将新生成的故障替换信息写入非易失性存储介质中。其中,第一模块和故障替换信息的概念已经在图2对应实施例中进行了介绍,此处不做赘述。非易失性存储介质具体可以为硬盘、复杂可编程逻辑器件(CPLD)、带电可擦可编程只读存储器(electrically erasable programmable read only memory,EEPROM)或其他类型的非易失性存储介质等。非易失性存储介质可以与处理器内核配置于同一设备中,也可以与处理器内核配置于不同设备中。处理器内核与非易失性存储介质可以通过内部接口或外部接口进行数据通信,内部接口包括但不限于总线,外部接口包括有线通信接口和无线通信接口。In some embodiments of the present application, after the memory controller writes the fault replacement information into the first register in the first module, the processor core can read the fault replacement information from the first register in the first module, and Write the newly generated failure replacement information into the non-volatile storage medium. Among them, the concepts of the first module and the fault replacement information have been introduced in the embodiment corresponding to FIG. 2 and will not be repeated here. The non-volatile storage medium may specifically be a hard disk, a complex programmable logic device (CPLD), an electrically erasable programmable read only memory (EEPROM), or other types of non-volatile storage media. The non-volatile storage medium and the processor core may be configured in the same device, or may be configured in a different device from the processor core. The processor core and the non-volatile storage medium can communicate data through an internal interface or an external interface. The internal interface includes but is not limited to a bus, and the external interface includes a wired communication interface and a wireless communication interface.
具体的,针对处理器内核从第一寄存器中读取故障替换信息的过程。内存控制器在将故障替换信息写入第一寄存器之后,会向处理器内核示出完成故障替换技术的信号,处理器内核在获知该完成信号之后,从第一模块中读取故障替换信息。Specifically, the processor core reads the fault replacement information from the first register. After the memory controller writes the fault replacement information into the first register, it will show the signal of completing the fault replacement technology to the processor core, and the processor core reads the fault replacement information from the first module after learning the completion signal.
更具体的,参见图2对应实施例中步骤201中的描述,数据处理系统中配置有第二寄存器,内存控制器将故障替换信息写入第一模块之后,会在第二寄存器中写入故障替换操作成功的信息(也即示出了完成故障替换技术的信号),处理器内核在读取第二寄存器中的信息之后,确定内存控制器已完成了故障替换操作,从第一寄存器中复制故障替换信息。More specifically, referring to the description in step 201 in the corresponding embodiment in FIG. 2, a second register is configured in the data processing system. After the memory controller writes the fault replacement information into the first module, the fault is written in the second register. The information that the replacement operation is successful (that is, the signal that shows the completion of the failure replacement technology), the processor core, after reading the information in the second register, determines that the memory controller has completed the failure replacement operation, and copies it from the first register Fault replacement information.
504、处理器内核将第一模块中的第二存储单元中存储的第一数据写入非易失性存储介 质。504. The processor core writes the first data stored in the second storage unit in the first module into the non-volatile storage medium.
本申请的一些实施例中,第一模块还可以包括至少一个第二存储单元,第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储为内存存储单元格的第一存储单元中的第一数据。In some embodiments of the present application, the first module may further include at least one second storage unit, and the second storage unit in the first module is used to store as The memory stores the first data in the first storage unit of the cell.
当某个故障存储单元的粒度为内存存储单元格时,内存控制器可以将故障存储单元中的第一数据写入第一模块中的第二存储单元(也即备份存储单元)中,则在内存控制器将故障替换信息写入第一模块中的第一寄存器之后,处理器内核可以从第一模块包括的第二存储单元中读取第一数据,并将第一数据写入非易失性存储介质中,以使在处理器内核和第一模块进行复位操作时,第一数据不丢失。When the granularity of a certain faulty storage unit is a memory storage unit, the memory controller can write the first data in the faulty storage unit into the second storage unit (that is, the backup storage unit) in the first module, and then After the memory controller writes the fault replacement information into the first register in the first module, the processor core can read the first data from the second storage unit included in the first module, and write the first data into the nonvolatile In a sexual storage medium, the first data is not lost when the processor core and the first module perform a reset operation.
具体的,针对处理器内核从第一模块中的备份存储单元读取第一数据的过程。内存控制器在将故障替换信息写入第一寄存器之后,会向处理器内核示出完成故障替换技术的信号,处理器内核在获知该完成信号之后,从第一模块中的备份存储单元中读取第一数据。处理器内核确定内存控制器已完成了故障替换操作的具体实现方式已在步骤503中进行了介绍,此处不做赘述。Specifically, the processor core reads the first data from the backup storage unit in the first module. After the memory controller writes the fault replacement information into the first register, it will show to the processor core a signal to complete the fault replacement technology. After the processor core learns the completion signal, it reads from the backup storage unit in the first module. Take the first data. The specific implementation manner in which the processor core determines that the memory controller has completed the fault replacement operation has been introduced in step 503, and will not be repeated here.
需要说明的是,步骤504为可选步骤,若没有粒度为内存存储单元格的故障存储单元,则不需要执行步骤504。若执行步骤504,则本申请实施例不限定步骤503和步骤504之间的执行顺序,可以先执行步骤503,再执行步骤504;也可以先执行步骤504,再执行步骤503;还可以同时执行步骤503和504。It should be noted that step 504 is an optional step. If there is no faulty storage unit whose granularity is a memory storage cell, step 504 does not need to be performed. If step 504 is performed, the embodiment of the present application does not limit the execution order between step 503 and step 504. Step 503 can be performed first, and then step 504; or step 504 can be performed first, and then step 503 can be performed at the same time. Steps 503 and 504.
505、复位控制电路获取复位信号。505. The reset control circuit obtains a reset signal.
本申请实施例中,步骤505的具体实现方式与图2对应实施例中步骤203的具体实现方式类似,可以参照上述描述,此处不做赘述。In the embodiment of the present application, the specific implementation of step 505 is similar to the specific implementation of step 203 in the embodiment corresponding to FIG.
506、复位控制电路向处理器内核和第一模块发送复位指令。506. The reset control circuit sends a reset instruction to the processor core and the first module.
本申请的一些实施例中,复位控制电路在获取到复位信号之后,无论获取到的是热复位信号还是冷复位信号,复位控制电路都会向处理器内核和第一模块发送复位指令,以触发处理器内核和第一模块执行复位操作。进一步地,若获取到的为热复位信号,复位控制电路不向内存发送复位指令,以控制内存不执行复位操作;若获取到的为冷复位信号,复位控制电路向内存发送复位指令,以控制内存执行复位操作。其中,复位指令的表现形式已经在图2对应实施例中进行了介绍,此处不做赘述。需要说明的是,虽然图5中第一模块集成于内存控制器中,但实际情况下,第一模块也可以设置于内存控制器之外,此处不做限定。In some embodiments of the present application, after the reset control circuit obtains the reset signal, regardless of whether the obtained reset signal is a warm reset signal or a cold reset signal, the reset control circuit sends a reset instruction to the processor core and the first module to trigger processing The processor core and the first module perform a reset operation. Further, if the acquired signal is a warm reset, the reset control circuit does not send a reset instruction to the memory to control the memory not to perform the reset operation; if the acquired signal is a cold reset, the reset control circuit sends a reset instruction to the memory to control The memory performs a reset operation. Among them, the manifestation of the reset command has been introduced in the embodiment corresponding to FIG. 2, and will not be repeated here. It should be noted that although the first module in FIG. 5 is integrated in the memory controller, in actual situations, the first module may also be provided outside the memory controller, which is not limited here.
可选地,若第一模块集成于内存控制器中,内存控制器集成于处理器中,则整个数据处理系统可以表现为一个处理器,则复位控制电路在获取到复位信号之后,可以向整个处理器发送复位指令,以控制整个处理器执行复位操作。Optionally, if the first module is integrated in the memory controller, and the memory controller is integrated in the processor, the entire data processing system may behave as a processor, and the reset control circuit may send the reset signal to the entire The processor sends a reset instruction to control the entire processor to perform a reset operation.
507、处理器内核判断复位操作是否为热复位操作,若为热复位操作,则进入步骤508,若为冷复位操作,则进入步骤510。507. The processor core determines whether the reset operation is a warm reset operation, if it is a warm reset operation, go to step 508, and if it is a cold reset operation, go to step 510.
本申请的一些实施例中,复位控制电路中还设置有第三寄存器,第三寄存器用于记录复位控制电路本次获取到的复位信号为冷复位信号还是热复位信号。处理器内核在接收到 复位控制电路发送的复位指令之后,查询第三寄存器中记录的信息,以确定触发本次复位操作的复位信号是否为热复位信号,也即判断本次复位操作是否为热复位操作。In some embodiments of the present application, a third register is further provided in the reset control circuit, and the third register is used to record whether the reset signal acquired by the reset control circuit this time is a cold reset signal or a warm reset signal. After receiving the reset instruction sent by the reset control circuit, the processor core queries the information recorded in the third register to determine whether the reset signal that triggered the reset operation is a hot reset signal, that is, whether the reset operation is hot Reset operation.
508、处理器内核对处理器内核和第一模块执行复位操作。508. The processor core performs a reset operation on the processor core and the first module.
本申请的一些实施例中,处理器内核中运行有初始化软件,在确定为热复位操作的情况下,处理器内核中的初始化软件需要对处理器内核和第一模块执行复位操作。在复位启动的过程中,处理器内核中的初始化软件从非易失性存储介质中获取故障替换信息集合。由于数据处理系统的运行过程中可以发生不止一次故障替换操作,而一个故障替换信息用于记录一次故障替换操作中存储单元的替换信息,则处理器内核从非易失性存储介质中获取到的可以为包括一个或多个故障替换信息的故障替换信息集合。其中,初始化软件具体可以表现为基本输入输出系统(basic input output system,BIOS)系统。In some embodiments of the present application, initialization software is running in the processor core, and if it is determined to be a warm reset operation, the initialization software in the processor core needs to perform a reset operation on the processor core and the first module. During the resetting process, the initialization software in the processor core obtains the failure replacement information collection from the non-volatile storage medium. Since more than one failure replacement operation can occur during the operation of the data processing system, and one failure replacement information is used to record the replacement information of the storage unit in a failure replacement operation, the processor core obtains the information from the non-volatile storage medium. It can be a failure replacement information collection that includes one or more failure replacement information. Among them, the initialization software may specifically be expressed as a basic input output system (BIOS) system.
可选地,若执行步骤504,则处理器内核中的初始化软件在进行复位启动过程中,还从非易失性存储介质中获取第一数据。Optionally, if step 504 is performed, the initialization software in the processor core also obtains the first data from the non-volatile storage medium during the reset and startup process.
具体的,针对处理器内核对第一模块执行复位操作的过程。在一种实现方式中,处理器内核中的初始化软件在对第一寄存器进行复位过程中,将故障替换信息集合回填至第一寄存器。本申请实施例中,处理器内核从非易失性存储介质中获取故障替换信息集合,并在第一模块进行复位过程中,直接将故障替换信息回填至第一模块,以实现数据处理系统复位后,内存控制器直接利用第一模块中的故障替换信息准确访问内存,操作简单,易于实现。Specifically, for the process of the processor core performing the reset operation on the first module. In an implementation manner, the initialization software in the processor core backfills the failure replacement information set to the first register in the process of resetting the first register. In the embodiment of the present application, the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system Later, the memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.
更具体的,处理器内核中的初始化软件在触发对处理器内核和第一模块执行复位操作后,在处理器内核中的初始化软件对第一寄存器执行复位操作的过程中,将获取到的多个故障替换信息分别回填至多个第一寄存器中。由于配置寄存器只支持硬件写入,状态寄存器同时支持硬件写入和软件写入,所以在本实现方式中第一寄存器具体表现为状态寄存器。More specifically, after the initialization software in the processor core triggers the reset operation on the processor core and the first module, the initialization software in the processor core performs the reset operation on the first register. Each failure replacement information is backfilled into a plurality of first registers respectively. Since the configuration register only supports hardware writing, and the status register supports both hardware writing and software writing, the first register is specifically represented as a status register in this implementation.
可选地,若执行步骤504,则处理器内核中的初始化软件在对第一模块执行复位操作过程中,将故障替换信息集合回填至第一寄存器,并将第一数据回填至第一模块中的第二存储单元中。处理器内核将第一数据回填至第一模块中的第二存储单元的实现方式,与将故障替换信息回填至第一寄存器中的实现方式类似,此处不做赘述。本实现方式中,还将第一模块中的第二存储单元中存储的第一数据写入非易失性存储介质中,在对第一模块进行复位操作时,将第一数据回填至第一模块中,以保证第一数据不被丢失,从而保证了数据的完整性。Optionally, if step 504 is performed, the initialization software in the processor core backfills the failure replacement information set to the first register and backfills the first data to the first module during the reset operation of the first module The second storage unit. The implementation manner of the processor core backfilling the first data to the second storage unit in the first module is similar to the implementation manner of backfilling the fault replacement information to the first register, and will not be repeated here. In this implementation, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.
在一种实现方式中,处理器内核中的初始化软件对第一模块执行复位操作,以初始化第一模块;并根据故障替换信息集合,对内存的存储单元中的数据执行逆替换操作,逆替换操作用于将第二存储单元中的数据写入第一存储单元中,以使内存中数据在存储单元中的分布情况还原至初始状态。其中,将内存中数据在存储单元中的分布情况还原至初始状态并不是指将内存中的数据清除,而是指将内存中的数据按照执行过故障替换技术之前的存储模式进行存储。本申请实施例中,由于处理器内核故障或内存控制器故障也有可能会导致内存中的某个存储单元满足故障替换条件,也即在对处理器内核和内存控制器进行复位操作后,内存中满足故障替换条件的存储单元可能会再次变为可使用的存储单元,所以 在对处理器内核和内存控制器进行复位操作后,对内存的存储单元中的数据执行逆替换操作,也即释放了备份存储单元,有利于延长内存的使用寿命。In one implementation, the initialization software in the processor core performs a reset operation on the first module to initialize the first module; and according to the failure replacement information set, performs reverse replacement operation on the data in the storage unit of the memory, reverse replacement The operation is used to write the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state. Among them, restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented. In the embodiments of the present application, due to a processor core failure or a memory controller failure, a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory The storage unit that meets the failure replacement conditions may become usable storage unit again, so after resetting the processor core and memory controller, perform reverse replacement operation on the data in the memory storage unit, which is also released The backup storage unit helps to extend the service life of the memory.
更具体的,处理器内核中的初始化软件对第一模块执行复位操作,以初始化第一模块后,第一模块中记录的故障替换信息集合被清除。由于每个故障替换信息记录了一个第一存储单元与一个第二存储单元之间的替换关系,则处理器内核中的初始化软件可以根据故障替换信息,获知第一存储单元的位置和第二存储单元的位置,进而将一个第二存储单元中存储的数据重新写会第一存储单元,也即对内存的存储单元中的数据执行逆替换操作。More specifically, the initialization software in the processor core performs a reset operation on the first module, so that after the first module is initialized, the failure replacement information set recorded in the first module is cleared. Since each failure replacement information records the replacement relationship between a first storage unit and a second storage unit, the initialization software in the processor core can learn the location of the first storage unit and the second storage unit based on the failure replacement information. The location of the unit, and then rewrite the data stored in a second storage unit to the first storage unit, that is, perform an inverse replacement operation on the data in the storage unit of the memory.
进一步地,与故障存储单元的粒度级为内存颗粒,处理器内核中的初始化软件还需要利用奇偶校验颗粒中的数据对第二存储单元中的数据进行校验,若发现第二存储单元中的数据存在错误,则利用ECC纠错颗粒中的数据对第二存储单元中的数据进行纠错,进而将进行过纠错处理后的第二存储单元中的数据重新写入第一存储单元中。Further, as the granularity level of the faulty storage unit is memory particles, the initialization software in the processor core also needs to use the data in the parity check particles to verify the data in the second storage unit. If it is found in the second storage unit If there is an error in the data, use the data in the ECC error correction particles to correct the data in the second storage unit, and then rewrite the data in the second storage unit after the error correction process into the first storage unit .
对应的,逆替换操作还需要对内存颗粒中的数据进行数据重组。Correspondingly, the reverse replacement operation also needs to reorganize the data in the memory particles.
为进一步理解本方案,请参阅图6,图6为本申请实施例提供的数据处理方法中逆替换操作的一种示意图。结合图3进行举例,图6中包括(a)和(b)两个子示意图,(a)子示意图代表进行逆替换操作前一个内存条中的数据分布情况,如(a)子示意图所示,在执行了故障替换操作后,Rank A中的颗粒1的数据被写入了Rank A的ECC纠错颗粒中,Rank A和Rank B共享一个Rank B的ECC纠错颗粒,则逆替换操作为需要将Rank A的ECC纠错颗粒中的数据重新写入Rank A的颗粒1中。(b)子示意图代表进行逆替换操作后一个内存条中的数据分布情况,处理器内核在利用Rank A的奇偶校验位颗粒对Rank A的ECC纠错颗粒中的数据进行校验之后,发现Rank A的ECC纠错颗粒中的数据没有错误,进而读取Rank A的ECC纠错颗粒中的数据,并写入Rank A的颗粒1中,处理器内核还对Rank A和Rank B进行数据重组,也即Rank A和Rank B中的数据存储模式重新变回两个16+2的存储模式,从而实现了内存中数据在存储单元中的分布情况还原至初始状态,应理解,图6中的示例仅为方便理解故障替换技术,不用于限定本方案。To further understand this solution, please refer to FIG. 6. FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application. Take Figure 3 as an example. Figure 6 includes (a) and (b) two sub-schematic diagrams. (a) sub-schematic diagram represents the data distribution in the memory bank before the reverse replacement operation, as shown in (a) sub-schematic diagram. After the fault replacement operation is performed, the data of particle 1 in Rank A is written into the ECC error correction particles of Rank A, and Rank A and Rank B share an ECC error correction particle of Rank B, and the reverse replacement operation is required Rewrite the data in the ECC error correction particle of Rank A into the particle 1 of Rank A. (b) The sub-schematic diagram represents the data distribution in a memory stick after the reverse replacement operation. After the processor core uses the parity bit particles of Rank A to verify the data in the ECC error correction particles of Rank A, it is found There is no error in the data in the ECC error correction particles of Rank A, and the data in the ECC error correction particles of Rank A are read and written into the particle 1 of Rank A. The processor core also reorganizes the data of Rank A and Rank B. , That is, the data storage mode in Rank A and Rank B is changed back to the two 16+2 storage modes, so that the distribution of the data in the memory in the storage unit is restored to the initial state. It should be understood that in Figure 6 The example is only to facilitate the understanding of the fault replacement technology, and is not used to limit the solution.
需要说明的是,本申请实施例不限定步骤501至504与步骤505至508之间的执行次数,可以为在执行多次步骤501至504之后,执行一次步骤505至508。It should be noted that the embodiment of the present application does not limit the number of executions between steps 501 to 504 and steps 505 to 508, and may be that steps 505 to 508 are executed once after steps 501 to 504 are executed multiple times.
509、处理器内核不从非易失性存储介质中获取故障替换信息集合。509. The processor core does not obtain the failure replacement information collection from the non-volatile storage medium.
本申请的一些实施例中,在处理器内核确定本次复位操作为冷复位操作的情况下,处理器内核不再从非易失性存储介质中获取故障替换信息集合,而是直接对处理器内核、第一模块、内存控制器和内存执行复位操作,也即对整个数据处理系统进行初始化。In some embodiments of the present application, in the case that the processor core determines that this reset operation is a cold reset operation, the processor core no longer obtains the failure replacement information collection from the non-volatile storage medium, but directly reports to the processor The kernel, the first module, the memory controller, and the memory perform a reset operation, that is, initialize the entire data processing system.
本申请实施例中,由于在复位控制电路获取到的是冷复位信号的情况下,证明触发复位操作的原因是内存发生了故障,则这个时候需要对内存进行复位,也即内存中的数据会被清除,从而不再有保证内存中数据不丢失的需求,本情况下不再从非易失性存储介质中获取故障替换信息集合,避免执行冗余步骤,提高复位过程的效率。In the embodiment of this application, since the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be It is cleared, so there is no need to ensure that the data in the memory is not lost. In this case, the failure replacement information collection is no longer obtained from the non-volatile storage medium, avoiding redundant steps, and improving the efficiency of the reset process.
需要说明的是,步骤507和509为可选步骤,若不执行步骤507和509,则在执行完步骤505之后,可以直接执行步骤508。It should be noted that steps 507 and 509 are optional steps. If steps 507 and 509 are not executed, step 508 can be directly executed after step 505 is executed.
本申请实施例中,提出故障替换信息这一新概念,并在复位系统中增设专门用来存储 故障替换信息的第一寄存器,在内存控制器向第一模块中写入故障替换信息后,处理器内核就将新生成的故障替换信息写入非易失性存储介质中,从而数据处理系统的复位不会导致故障替换信息的丢失,即使由于对内存中的故障存储单元进行故障替换处理导致内存中的部分存储单元被隔离替换,在系统复位后,能够根据前述故障替换信息了解内存中哪些存储单元为被隔离的故障存储单元,以避免由于访问被隔离的故障存储单元而导致的系统宕机,也即能够实现正确访问内存,以实现在使用内存中的故障替换技术和复位技术的前提下,内存中数据的不丢失。In the embodiment of this application, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system. After the memory controller writes the fault replacement information to the first module, the processing The processor core writes the newly generated failure replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the failure replacement information, even if the failure replacement processing of the failed storage unit in the memory causes the memory Part of the storage units in the system are isolated and replaced. After the system is reset, it is possible to know which storage units in the memory are isolated faulty storage units based on the aforementioned fault replacement information, so as to avoid system downtime due to access to the isolated faulty storage unit , That is, it can realize the correct access to the memory, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.
在图1至图6所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图7,图7为本申请实施例提供的复位系统的一种系统示意图。复位系统700包括可以包括复位控制电路701、处理器内核7021和第一模块703。其中,第一模块703包括第一寄存器,第一寄存器用于存储故障替换信息,故障替换信息包括第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故障的存储单元;复位控制电路701,用于获取热复位信号;复位控制电路701,还用于响应获取到的热复位信号,向第二模块702发送复位指令,第二模块702包括处理器内核7021,且不包括第一模块703,复位指令用于触发执行复位操作。On the basis of the embodiments corresponding to FIG. 1 to FIG. 6, in order to better implement the above-mentioned solutions of the embodiments of the present application, related equipment for implementing the above-mentioned solutions is also provided below. For details, refer to FIG. 7, which is a schematic diagram of a reset system provided by an embodiment of the application. The reset system 700 may include a reset control circuit 701, a processor core 7021, and a first module 703. Among them, the first module 703 includes a first register, the first register is used to store failure replacement information, the failure replacement information includes location information of the first storage unit, and the first storage unit is present when the storage unit in the memory is replaced with failure. The faulty storage unit; the reset control circuit 701, used to obtain a hot reset signal; the reset control circuit 701, is also used to respond to the obtained hot reset signal, send a reset instruction to the second module 702, the second module 702 includes the processor core 7021, and does not include the first module 703, the reset instruction is used to trigger the execution of the reset operation.
在一种可能的设计中,故障替换信息中还包括第二存储单元的位置信息,第二存储单元为在对内存中的存储单元进行故障替换时的备份存储单元。In a possible design, the failure replacement information also includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure.
在一种可能的设计中,第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。In a possible design, the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
在一种可能的设计中,第一模块703还包括至少一个第二存储单元,第一模块703中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储为内存存储单元格的第一存储单元中的数据。In a possible design, the first module 703 further includes at least one second storage unit, and the second storage unit in the first module 703 is used to store the The data in the first storage unit of the memory storage cell.
在一种可能的设计中,请参阅图8,图8为本申请实施例提供的复位系统的一种系统示意图。复位系统700包括内存控制器704,第一模块703集成于内存控制器704中。复位控制电路701,具体用于控制处理器内核7021执行复位操作,且控制内存控制器704不执行复位操作。In a possible design, please refer to FIG. 8. FIG. 8 is a system schematic diagram of the reset system provided by an embodiment of the application. The reset system 700 includes a memory controller 704, and the first module 703 is integrated in the memory controller 704. The reset control circuit 701 is specifically configured to control the processor core 7021 to perform a reset operation, and control the memory controller 704 not to perform a reset operation.
在一种可能的设计中,复位控制电路701,还用于在获取到冷复位信号的情况下,向处理器内核7021和第一模块703发送复位指令。In a possible design, the reset control circuit 701 is also used to send a reset instruction to the processor core 7021 and the first module 703 when a cold reset signal is acquired.
在一种可能的设计中,复位控制电路701,具体用于向处理器内核7021发送复位指令,向第一模块703发送第一指令,第一指令指示第一模块703不执行复位操作。In a possible design, the reset control circuit 701 is specifically configured to send a reset instruction to the processor core 7021 and a first instruction to the first module 703. The first instruction instructs the first module 703 not to perform a reset operation.
需要说明的是,复位系统700中各模块/单元之间的信息交互、执行过程等内容,与本申请中图2至图4对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process among the various modules/units in the reset system 700 are based on the same concept as the method embodiments in Figures 2 to 4 in this application. For details, please refer to the foregoing description of this application. The description in the method embodiment shown will not be repeated here.
本申请实施例还提供一种数据处理系统,具体参阅图9,图9为本申请实施例提供的数据处理系统的一种系统示意图。数据处理系统900包括处理器内核901和第一模块902,第一模块902包括第一寄存器,第一寄存器用于存储故障替换信息,故障替换信息中包括第一存储单元的位置信息,第一存储单元为在对内存中的存储单元进行故障替换时存在故 障的存储单元。处理器内核901,用于从第一寄存器中获取故障替换信息;处理器内核901,还用于将故障替换信息写入非易失性存储介质中,以使在处理器内核901和第一模块902进行复位操作时,故障替换信息不丢失。An embodiment of the present application also provides a data processing system. For details, refer to FIG. 9. FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of the present application. The data processing system 900 includes a processor core 901 and a first module 902. The first module 902 includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The processor core 901 is used to obtain fault replacement information from the first register; the processor core 901 is also used to write fault replacement information into a non-volatile storage medium, so that the processor core 901 and the first module When 902 performs a reset operation, the fault replacement information is not lost.
在一种可能的设计中,故障替换信息中还包括第二存储单元的位置信息,第二存储单元为在对内存中的存储单元进行故障替换时的备份存储单元。In a possible design, the failure replacement information also includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure.
在一种可能的设计中,第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。In a possible design, the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
在一种可能的设计中,请参阅图10,图10为本申请实施例提供的数据处理系统的一种系统示意图。系统900包括内存控制器903,第一模块902集成于内存控制器903中。In a possible design, please refer to FIG. 10, which is a system schematic diagram of a data processing system provided by an embodiment of this application. The system 900 includes a memory controller 903, and the first module 902 is integrated in the memory controller 903.
在一种可能的设计中,处理器内核901,还用于在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合,在第一寄存器进行复位过程中,将故障替换信息集合回填至第一寄存器,其中,故障替换信息集合包括至少一个故障替换信息。In a possible design, the processor core 901 is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and during the resetting process of the first register, Backfill the failure replacement information set to the first register, where the failure replacement information set includes at least one failure replacement information.
在一种可能的设计中,第一模块902还包括至少一个第二存储单元,第一模块902中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储为内存存储单元格的第一存储单元中的第一数据。处理器内核901,还用于从第一模块902中的第二存储单元中获取第一数据,并将第一数据写入非易失性存储介质中,以使在处理器内核901和第一模块902进行复位操作时,第一数据不丢失;处理器内核901,还用于在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合和第一数据,在第一模块902进行复位过程中,将故障替换信息集合回填至第一寄存器,并将第一数据回填至第一模块902中的第二存储单元,其中,故障替换信息集合包括至少一个故障替换信息。In a possible design, the first module 902 further includes at least one second storage unit, and the second storage unit in the first module 902 is used to store the The first data in the first storage unit of the memory storage cell. The processor core 901 is also used to obtain the first data from the second storage unit in the first module 902, and write the first data into the non-volatile storage medium, so that the processor core 901 and the first When the module 902 performs a reset operation, the first data is not lost; the processor core 901 is also used to obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, During the resetting process of the first module 902, the failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module 902, wherein the failure replacement information set includes at least one failure replacement information.
在一种可能的设计中,处理器内核901,还用于对第一模块902执行复位操作,以初始化第一模块902;处理器内核901,还用于在复位操作为热复位操作的情况下,从非易失性存储介质中获取故障替换信息集合,并根据故障替换信息集合,对内存的存储单元中的数据执行逆替换操作,其中,故障替换信息集合中包括至少一个故障替换信息,逆替换操作用于将第二存储单元中的数据重新写入第一存储单元中,以使内存中数据在存储单元中的分布情况还原至初始状态。In a possible design, the processor core 901 is also used to perform a reset operation on the first module 902 to initialize the first module 902; the processor core 901 is also used to perform a reset operation when the reset operation is a warm reset operation , Obtain the failure replacement information collection from the non-volatile storage medium, and perform the reverse replacement operation on the data in the storage unit of the memory according to the failure replacement information collection, wherein the failure replacement information collection includes at least one failure replacement information, and The replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state.
在一种可能的设计中,处理器内核901,还用于在复位操作为冷复位操作的情况下,不从非易失性存储介质中获取故障替换信息集合。In a possible design, the processor core 901 is also used to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
需要说明的是,数据处理系统900中各模块/单元之间的信息交互、执行过程等内容,与本申请中图5和图6对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process among the modules/units in the data processing system 900 are based on the same concept as the method embodiments in Figures 5 and 6 in this application. For details, please refer to this application. The description in the foregoing method embodiment will not be repeated here.
本申请实施例还提供了一种计算机设备,请参阅图11,图11为本申请实施提供的计算机设备的一种结构示意图。计算机设备110上可以部署有图7或图8对应实施例中所描述的复位系统700,用于实现图2至图4对应实施例中复位系统的功能。或者,计算机设备110上可以部署有图9或图10对应实施例中所描述的数据处理系统900,用于实现图5或图6对应实施例中数据处理系统的功能。具体的,计算机设备110包括:有线或无线网络 接口1101、输入输出接口1102、处理器1103和非易失性存储介质1104(其中计算机设备110中的处理器1103的数量可以一个或多个,图11中以一个处理器为例)。其中,处理器1103可以包括应用处理器11031和通信处理器11032。存储器1104可以包括非易失性存储介质11041和内存11042。在本申请的一些实施例中,有线或无线网络接口1101、输入输出接口1102、处理器1103和非易失性存储介质1104可通过总线或其它方式连接。An embodiment of the present application also provides a computer device. Please refer to FIG. 11. FIG. 11 is a schematic diagram of a structure of the computer device provided in the implementation of this application. The reset system 700 described in the embodiment corresponding to FIG. 7 or FIG. 8 may be deployed on the computer device 110 to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4. Alternatively, the data processing system 900 described in the embodiment corresponding to FIG. 9 or FIG. 10 may be deployed on the computer device 110 to implement the functions of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6. Specifically, the computer device 110 includes: a wired or wireless network interface 1101, an input/output interface 1102, a processor 1103, and a non-volatile storage medium 1104 (the number of processors 1103 in the computer device 110 may be one or more, as shown in FIG. Take a processor as an example in 11). The processor 1103 may include an application processor 11031 and a communication processor 11032. The memory 1104 may include a non-volatile storage medium 11041 and a memory 11042. In some embodiments of the present application, the wired or wireless network interface 1101, the input/output interface 1102, the processor 1103, and the non-volatile storage medium 1104 may be connected by a bus or other means.
内存11042可以包括只读存储器和随机存取存储器,并向处理器1103提供指令和数据。非易失性存储介质11041的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。非易失性存储介质1104存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。The memory 11042 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1103. A part of the non-volatile storage medium 11041 may also include a non-volatile random access memory (NVRAM). The non-volatile storage medium 1104 stores processors and operating instructions, executable modules or data structures, or their subsets, or their extended sets. The operating instructions may include various operating instructions for implementing various operate.
处理器1103控制计算机设备的操作。具体的应用中,计算机设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1103 controls the operation of the computer device. In specific applications, the various components of the computer equipment are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus. However, for the sake of clear description, various buses are referred to as bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器1103中,或者由处理器1103实现。处理器1103可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1103中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1103可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1103可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1104,处理器1103读取存储器1104中的信息,结合其硬件完成上述方法的步骤。The method disclosed in the foregoing embodiment of the present application may be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1103 or instructions in the form of software. The aforementioned processor 1103 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1103 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and completes the steps of the foregoing method in combination with its hardware.
有线或无线网络接口1101用于实现计算机设备110的信号发送和信号接收功能。输入输出接口1102可用于接收输入的数字或字符信息,输出数字或字符信息;输入输出接口1102还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;输入输出接口1102还可以包括显示屏等显示设备。The wired or wireless network interface 1101 is used to implement the signal sending and signal receiving functions of the computer device 110. The input and output interface 1102 can be used to receive input digital or character information, and output digital or character information; the input and output interface 1102 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the input and output interface 1102 also It can include display devices such as display screens.
本申请实施例中,在一种情况下,应用处理器11031,用于实现图2至图4对应实施例中的复位系统的功能。需要说明的是,对于应用处理器11031执行图2至图4对应实施例中计复位系统的功能的具体实现方式以及带来的有益效果,均可以参考图2至图4对应的各个方法实施例中的叙述,此处不再一一赘述。In the embodiment of the present application, in one case, the application processor 11031 is configured to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4. It should be noted that, for the specific implementation of the function of the reset system in the embodiment corresponding to FIGS. 2 to 4 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIGS. 2 to 4 The narratives in, I will not repeat them one by one here.
本申请实施例中,在另一种情况下,应用处理器11031,用于实现图5或图6对应实施例中的数据处理系统的功能。需要说明的是,对于应用处理器11031执行图5或图6对应实施例中数据处理系统的功能的具体实现方式以及带来的有益效果,均可以参考图5或图 6对应的各个方法实施例中的叙述,此处不再一一赘述。In the embodiment of the present application, in another case, the application processor 11031 is configured to implement the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6. It should be noted that, for the specific implementation of the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIG. 5 or FIG. 6 The narratives in, I will not repeat them one by one here.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图2至图4所示实施例描述的方法中复位系统所执行的步骤,或者,执行如前述图5或图6所示实施例描述的方法中数据处理系统所执行的步骤。The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a program for generating the driving speed of a vehicle. When it is driven on a computer, the computer executes the steps shown in Figs. 2 to 4 above. The steps performed by the reset system in the method described in the illustrated embodiment, or the steps performed by the data processing system in the method described in the foregoing embodiment shown in FIG. 5 or FIG. 6 are performed.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上行驶时,使得计算机执行如前述图2至图4所示实施例描述的方法中复位系统所执行的步骤,或者,执行如前述图5或图6所示实施例描述的方法中数据处理系统所执行的步骤。The embodiment of the present application also provides a product including a computer program, which when it runs on a computer, causes the computer to execute the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or execute The steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6 are the same.
本申请实施例中还提供一种电路系统,所述电路系统包括处理电路,所述处理电路配置为执行如前述图2至图4所示实施例描述的方法中复位系统所执行的步骤,或者,执行如前述图5或图6所示实施例描述的方法中数据处理系统所执行的步骤。An embodiment of the present application also provides a circuit system, the circuit system includes a processing circuit configured to perform the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or , Execute the steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6.
本申请实施例提供的复位系统或数据处理系统具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图2至图4所示实施例描述的复位方法,或者上述图5或图6所示实施例描述的数据处理方法,执行。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The reset system or data processing system provided by the embodiment of the present application may specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface or a pin. Or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to make the chip execute the reset method described in the embodiment shown in FIG. 2 to FIG. 4, or the data processing method described in the embodiment shown in FIG. 5 or FIG. . Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a storage unit located outside the chip. Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。Wherein, the processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the foregoing first aspect.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CLU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CLUs, dedicated memories, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware, and the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the method described in each embodiment of this application .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims (29)

  1. 一种复位系统,其特征在于,所述系统包括复位控制电路、处理器内核和第一模块,所述第一模块包括第一寄存器,所述第一寄存器用于存储故障替换信息,所述故障替换信息包括第一存储单元的位置信息,所述第一存储单元为在对所述内存中的存储单元进行故障替换时存在故障的存储单元;A reset system, characterized in that the system includes a reset control circuit, a processor core, and a first module, the first module includes a first register, the first register is used to store fault replacement information, and the fault The replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;
    所述复位控制电路,用于获取热复位信号;The reset control circuit is used to obtain a hot reset signal;
    所述复位控制电路,还用于响应获取到的所述热复位信号,向第二模块发送复位指令,所述第二模块包括所述处理器内核,且不包括所述第一模块,复位指令用于触发执行复位操作。The reset control circuit is further configured to send a reset instruction to a second module in response to the acquired hot reset signal, where the second module includes the processor core and does not include the first module, and the reset instruction Used to trigger the reset operation.
  2. 根据权利要求1所述的系统,其特征在于,所述故障替换信息中还包括第二存储单元的位置信息,所述第二存储单元为在对所述内存中的存储单元进行故障替换时的备份存储单元。The system according to claim 1, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is failed replacement Backup storage unit.
  3. 根据权利要求1所述的系统,其特征在于,所述第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。The system according to claim 1, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  4. 根据权利要求2所述的系统,其特征在于,所述第一模块还包括至少一个第二存储单元,所述第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储所述为内存存储单元格的第一存储单元中的数据。The system according to claim 2, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the memory in the at least one first storage unit. In the case of a cell, the data in the first storage unit that is the memory storage cell is stored.
  5. 根据权利要求1至4任一项所述的系统,其特征在于,所述系统包括内存控制器,所述第一模块集成于所述内存控制器中;The system according to any one of claims 1 to 4, wherein the system comprises a memory controller, and the first module is integrated in the memory controller;
    所述复位控制电路,具体用于向所述处理器内核发送复位指令,且不向所述内存控制器发送复位指令。The reset control circuit is specifically configured to send a reset instruction to the processor core, and not send a reset instruction to the memory controller.
  6. 根据权利要求1至4任一项所述的系统,其特征在于,The system according to any one of claims 1 to 4, characterized in that:
    所述复位控制电路,还用于在获取到冷复位信号的情况下,向所述处理器内核和所述第一模块发送复位指令。The reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired.
  7. 一种数据处理系统,所述系统包括处理器内核和第一模块,所述第一模块包括第一寄存器,所述第一寄存器用于存储故障替换信息,所述故障替换信息中包括第一存储单元的位置信息,所述第一存储单元为在对所述内存中的存储单元进行故障替换时存在故障的存储单元;A data processing system, the system includes a processor core and a first module, the first module includes a first register, the first register is used to store fault replacement information, the fault replacement information includes a first storage Location information of the unit, the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;
    所述处理器内核,用于从所述第一寄存器中获取所述故障替换信息;The processor core is configured to obtain the fault replacement information from the first register;
    所述处理器内核,还用于将所述故障替换信息写入非易失性存储介质中。The processor core is also used to write the fault replacement information into a non-volatile storage medium.
  8. 根据权利要求7所述的系统,其特征在于,所述故障替换信息中还包括第二存储单元的位置信息,所述第二存储单元为在对所述内存中的存储单元进行故障替换时的备份存储单元。The system according to claim 7, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is failed replacement Backup storage unit.
  9. 根据权利要求7所述的系统,其特征在于,所述第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。The system according to claim 7, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  10. 根据权利要求7至9任一项所述的系统,其特征在于,所述系统包括内存控制器,所述第一模块集成于所述内存控制器中。The system according to any one of claims 7 to 9, wherein the system comprises a memory controller, and the first module is integrated in the memory controller.
  11. 根据权利要求7至9任一项所述的系统,其特征在于,The system according to any one of claims 7 to 9, characterized in that:
    所述处理器内核,还用于在复位操作为热复位操作的情况下,从所述非易失性存储介质中获取故障替换信息集合,在所述第一寄存器进行复位过程中,将所述故障替换信息集合回填至所述第一寄存器,其中,所述故障替换信息集合包括至少一个所述故障替换信息。The processor core is further configured to obtain a failure replacement information set from the non-volatile storage medium when the reset operation is a hot reset operation, and during the resetting process of the first register, reset the The failure replacement information set is backfilled to the first register, wherein the failure replacement information set includes at least one of the failure replacement information.
  12. 根据权利要求8所述的系统,其特征在于,所述第一模块还包括至少一个第二存储单元,所述第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储所述为内存存储单元格的第一存储单元中的第一数据;The system according to claim 8, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the memory in the at least one first storage unit. In the case of a cell, store the first data in the first storage unit that is the memory storage cell;
    所述处理器内核,还用于从所述第一模块中的第二存储单元中获取所述第一数据,并将所述第一数据写入所述非易失性存储介质中,以使在所述处理器内核和所述第一模块进行复位操作时,所述第一数据不丢失;The processor core is further configured to obtain the first data from the second storage unit in the first module, and write the first data into the non-volatile storage medium, so that When the processor core and the first module perform a reset operation, the first data is not lost;
    所述处理器内核,还用于在复位操作为热复位操作的情况下,从所述非易失性存储介质中获取故障替换信息集合和所述第一数据,在所述第一模块进行复位过程中,将所述故障替换信息集合回填至所述第一寄存器,并将所述第一数据回填至所述第一模块中的第二存储单元,其中,所述故障替换信息集合包括至少一个所述故障替换信息。The processor core is further configured to obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, and perform the reset in the first module In the process, the failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module, wherein the failure replacement information set includes at least one The failure replacement information.
  13. 根据权利要求8所述的系统,其特征在于,The system according to claim 8, wherein:
    所述处理器内核,还用于对所述第一模块执行复位操作,以初始化所述第一模块;The processor core is further configured to perform a reset operation on the first module to initialize the first module;
    所述处理器内核,还用于在复位操作为热复位操作的情况下,从所述非易失性存储介质中获取所述故障替换信息集合,并根据所述故障替换信息集合,对所述内存的存储单元中的数据执行逆替换操作,其中,所述故障替换信息集合中包括至少一个所述故障替换信息,所述逆替换操作用于将所述第二存储单元中的数据重新写入所述第一存储单元中,以使所述内存中数据在存储单元中的分布情况还原至初始状态。The processor core is further configured to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a hot reset operation, and to compare the failure replacement information set according to the failure replacement information set The data in the storage unit of the memory performs a reverse replacement operation, wherein the failure replacement information set includes at least one piece of the failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit In the first storage unit, the distribution of the data in the memory in the storage unit is restored to an initial state.
  14. 根据权利要求11所述的系统,其特征在于,The system of claim 11, wherein:
    所述处理器内核,还用于在复位操作为冷复位操作的情况下,不从所述非易失性存储介质中获取所述故障替换信息集合。The processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
  15. 一种复位方法,其特征在于,所述方法应用于复位系统中,所述系统包括复位控制电路、处理器内核和第一模块,所述第一模块包括第一寄存器,所述第一寄存器用于存储故障替换信息,所述故障替换信息包括第一存储单元的位置信息,所述第一存储单元为在对所述内存中的存储单元进行故障替换时存在故障的存储单元;A reset method, characterized in that the method is applied to a reset system, the system includes a reset control circuit, a processor core, and a first module, the first module includes a first register, the first register is used For storing failure replacement information, the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;
    所述复位控制电路获取热复位信号;The reset control circuit obtains a hot reset signal;
    所述复位控制电路响应获取到的所述热复位信号,向第二模块发送复位指令,所述第二模块包括所述处理器内核,且不包括所述第一模块,复位指令用于触发执行复位操作。The reset control circuit responds to the acquired hot reset signal and sends a reset instruction to a second module. The second module includes the processor core and does not include the first module. The reset instruction is used to trigger execution Reset operation.
  16. 根据权利要求15所述的方法,其特征在于,所述故障替换信息还包括第二存储单元的位置信息,所述第二存储单元为在对所述内存中的存储单元进行故障替换时的备份存储单元。The method according to claim 15, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup when the storage unit in the memory is failed replacement Storage unit.
  17. 根据权利要求15所述的方法,其特征在于,所述第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。The method according to claim 15, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  18. 根据权利要求16所述的方法,其特征在于,所述第一模块还包括至少一个第二存 储单元,所述第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储所述为内存存储单元格的第一存储单元中的数据。The method according to claim 16, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the at least one first storage unit as a memory In the case of a cell, the data in the first storage unit that is the memory storage cell is stored.
  19. 根据权利要求15至18任一项所述的方法,其特征在于,所述方法包括内存控制器,所述第一模块集成于所述内存控制器中;The method according to any one of claims 15 to 18, wherein the method comprises a memory controller, and the first module is integrated in the memory controller;
    所述复位控制电路向所述处理器内核发送复位指令,且不向所述第一模块发送复位指令,包括:The reset control circuit sending a reset instruction to the processor core and not sending a reset instruction to the first module includes:
    所述复位控制电路向所述处理器内核发送复位指令,且不向所述内存控制器发送复位指令。The reset control circuit sends a reset instruction to the processor core, and does not send a reset instruction to the memory controller.
  20. 根据权利要求15至18任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15 to 18, wherein the method further comprises:
    所述复位控制电路在获取到冷复位信号的情况下,向所述处理器内核和所述第一模块发送复位指令。The reset control circuit sends a reset instruction to the processor core and the first module when the cold reset signal is acquired.
  21. 一种数据处理方法,所述方法应用于数据处理系统中,所述数据处理系统包括处理器内核和第一模块,所述第一模块包括第一寄存器,所述第一寄存器用于存储故障替换信息,所述故障替换信息包括第一存储单元的位置信息,所述第一存储单元为在对所述内存中的存储单元进行故障替换时存在故障的存储单元;A data processing method, the method is applied to a data processing system, the data processing system includes a processor core and a first module, the first module includes a first register, the first register is used to store fault replacement Information, the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;
    所述处理器内核从所述第一寄存器中获取所述故障替换信息;Acquiring, by the processor core, the fault replacement information from the first register;
    所述处理器内核将所述故障替换信息写入非易失性存储介质中。The processor core writes the failure replacement information into a non-volatile storage medium.
  22. 根据权利要求21所述的方法,其特征在于,所述故障替换信息中还包括第二存储单元的位置信息,所述第二存储单元为在对所述内存中的存储单元进行故障替换时的备份存储单元。22. The method according to claim 21, wherein the fault replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is faulty replaced. Backup storage unit.
  23. 根据权利要求21所述的方法,其特征在于,所述第一存储单元的粒度为以下中的任一项:内存存储单元格、内存行、内存块、内存颗粒、内存面和内存条。The method according to claim 21, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
  24. 根据权利要求21至23任一项所述的方法,其特征在于,所述系统包括内存控制器,所述第一模块集成于所述内存控制器中。The method according to any one of claims 21 to 23, wherein the system comprises a memory controller, and the first module is integrated in the memory controller.
  25. 根据权利要求21至23任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 21 to 23, wherein the method further comprises:
    在复位操作为热复位操作的情况下,所述处理器内核从所述非易失性存储介质中获取故障替换信息集合,在所述第一模块进行复位过程中,将所述故障替换信息集合回填至所述第一寄存器,其中,所述故障替换信息包括至少一个所述故障替换信息。In the case that the reset operation is a hot reset operation, the processor core obtains a failure replacement information collection from the non-volatile storage medium, and during the reset process of the first module, the failure replacement information is collected Backfilling to the first register, wherein the failure replacement information includes at least one of the failure replacement information.
  26. 根据权利要求22所述的方法,其特征在于,所述第一模块还包括至少一个第二存储单元,所述第一模块中的第二存储单元用于在至少一个第一存储单元为内存存储单元格的情况下,存储所述为内存存储单元格的第一存储单元中的第一数据;The method according to claim 22, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the at least one first storage unit as a memory In the case of a cell, store the first data in the first storage unit that is the memory storage cell;
    所述方法还包括:The method also includes:
    所述处理器内核从所述第一模块中的第二存储单元中获取所述第一数据,并将所述第一数据写入所述非易失性存储介质中,以使在所述处理器内核和所述第一模块进行复位操作时,所述第一数据不丢失;The processor core obtains the first data from the second storage unit in the first module, and writes the first data into the non-volatile storage medium, so that the When the processor core and the first module perform a reset operation, the first data is not lost;
    在复位操作为热复位操作的情况下,所述处理器内核从所述非易失性存储介质中获取故障替换信息集合和所述第一数据,在所述第一模块进行复位过程中,将所述故障替换信 息集合回填至所述第一寄存器,并将所述第一数据回填至所述第一模块中的第二存储单元,其中,所述故障替换信息集合包括至少一个所述故障替换信息。In the case that the reset operation is a warm reset operation, the processor core obtains the failure replacement information set and the first data from the non-volatile storage medium, and during the reset process of the first module, The failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module, wherein the failure replacement information set includes at least one of the failure replacements information.
  27. 根据权利要求22所述的方法,其特征在于,所述方法还包括:The method according to claim 22, wherein the method further comprises:
    所述处理器内核对所述第一模块执行复位操作,以初始化所述第一模块;The processor core performs a reset operation on the first module to initialize the first module;
    在复位操作为热复位操作的情况下,所述处理器内核从所述非易失性存储介质中获取所述故障替换信息集合,并根据所述故障替换信息集合,对所述内存的存储单元中的数据执行逆替换操作,其中,所述故障替换信息集合中包括至少一个所述故障替换信息,所述逆替换操作用于将所述第二存储单元中的数据重新写入所述第一存储单元中,以使所述内存中数据在存储单元中的分布情况还原至初始状态。In the case that the reset operation is a hot reset operation, the processor core obtains the failure replacement information set from the non-volatile storage medium, and according to the failure replacement information set, performs a check on the storage unit of the memory The data in the second storage unit performs a reverse replacement operation, wherein the failure replacement information set includes at least one piece of the failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit into the first In the storage unit, the distribution of the data in the memory in the storage unit is restored to the initial state.
  28. 根据权利要求25所述的方法,其特征在于,所述方法还包括:The method according to claim 25, wherein the method further comprises:
    在复位操作为冷复位操作的情况下,所述处理器内核不从所述非易失性存储介质中获取所述故障替换信息集合。In the case that the reset operation is a cold reset operation, the processor core does not obtain the failure replacement information set from the non-volatile storage medium.
  29. 一种计算机设备,其特征在于,所述计算机设备中配置有权利要求1至权利要求6中任一项所述的复位系统,或者,所述计算机设备中配置有权利要求7至权利要求14任一项所述的数据处理系统。A computer device, wherein the computer device is equipped with the reset system according to any one of claims 1 to 6, or the computer device is equipped with any one of claims 7 to 14 The data processing system described in one item.
PCT/CN2021/102029 2020-06-24 2021-06-24 Reset system, data processing system, and related device WO2021259351A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010588804.3A CN113835923A (en) 2020-06-24 2020-06-24 Reset system, data processing system and related equipment
CN202010588804.3 2020-06-24

Publications (1)

Publication Number Publication Date
WO2021259351A1 true WO2021259351A1 (en) 2021-12-30

Family

ID=78964602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102029 WO2021259351A1 (en) 2020-06-24 2021-06-24 Reset system, data processing system, and related device

Country Status (2)

Country Link
CN (1) CN113835923A (en)
WO (1) WO2021259351A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168087A (en) * 2022-07-08 2022-10-11 超聚变数字技术有限公司 Method and device for determining granularity of repair resources of memory failure
WO2023179634A1 (en) * 2022-03-22 2023-09-28 华为技术有限公司 Data writing method and processing system
WO2024016864A1 (en) * 2022-07-19 2024-01-25 华为技术有限公司 Processor, information acquisition method, single board and network device
WO2024179469A1 (en) * 2023-03-01 2024-09-06 华为技术有限公司 Electronic device and related reset recovery method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115705262A (en) * 2021-08-17 2023-02-17 华为技术有限公司 Memory fault recovery method and system and memory

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102736957A (en) * 2012-05-25 2012-10-17 华为技术有限公司 Resetting method and device
CN103116551A (en) * 2013-01-31 2013-05-22 苏州国芯科技有限公司 Nor FLASH memory interface module applied to configurable logic block (CLB) bus
CN103235760A (en) * 2013-01-31 2013-08-07 苏州国芯科技有限公司 CLB-bus-based NorFLASH memory interface chip with high utilization ratio
US20170351564A1 (en) * 2016-06-06 2017-12-07 Canon Kabushiki Kaisha Control apparatus and control method for processor initialization
CN107678420A (en) * 2017-09-30 2018-02-09 北京理工大学 A kind of engine data on-line storage method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102736957A (en) * 2012-05-25 2012-10-17 华为技术有限公司 Resetting method and device
CN103116551A (en) * 2013-01-31 2013-05-22 苏州国芯科技有限公司 Nor FLASH memory interface module applied to configurable logic block (CLB) bus
CN103235760A (en) * 2013-01-31 2013-08-07 苏州国芯科技有限公司 CLB-bus-based NorFLASH memory interface chip with high utilization ratio
US20170351564A1 (en) * 2016-06-06 2017-12-07 Canon Kabushiki Kaisha Control apparatus and control method for processor initialization
CN107678420A (en) * 2017-09-30 2018-02-09 北京理工大学 A kind of engine data on-line storage method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179634A1 (en) * 2022-03-22 2023-09-28 华为技术有限公司 Data writing method and processing system
CN115168087A (en) * 2022-07-08 2022-10-11 超聚变数字技术有限公司 Method and device for determining granularity of repair resources of memory failure
CN115168087B (en) * 2022-07-08 2024-03-19 超聚变数字技术有限公司 Method and device for determining repair resource granularity of memory failure
WO2024016864A1 (en) * 2022-07-19 2024-01-25 华为技术有限公司 Processor, information acquisition method, single board and network device
WO2024179469A1 (en) * 2023-03-01 2024-09-06 华为技术有限公司 Electronic device and related reset recovery method

Also Published As

Publication number Publication date
CN113835923A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2021259351A1 (en) Reset system, data processing system, and related device
US10824499B2 (en) Memory system architectures using a separate system control path or channel for processing error information
US9684560B2 (en) Apparatus, system, and method to increase data integrity in a redundant storage system
US8495460B2 (en) Apparatus, system, and method for reconfiguring an array of storage elements
JP5404804B2 (en) Storage subsystem
US8195978B2 (en) Apparatus, system, and method for detecting and replacing failed data storage
US8448047B2 (en) Storage device, storage control device, data transfer intergrated circuit, and storage control method
US20230016555A1 (en) Data recovery method, apparatus, and solid state drive
US8479045B2 (en) Controller for disk array device, data transfer device, and method of power recovery process
KR20170098802A (en) Fault tolerant automatic dual in-line memory module refresh
CN101477480A (en) Memory control method, apparatus and memory read-write system
US11630731B2 (en) System and device for data recovery for ephemeral storage
WO2023020031A1 (en) Memory fault recovery method, system, and memory
US11609817B2 (en) Low latency availability in degraded redundant array of independent memory
CN111831476A (en) Method of controlling operation of RAID system
WO2013080299A1 (en) Data management device, data copy method, and program
US20220391298A1 (en) Node Mode Adjustment Method for when Storage Cluster BBU Fails and Related Component
US11340826B2 (en) Systems and methods for strong write consistency when replicating data
CN111221681A (en) Memory repairing method and device
KR20240019364A (en) Selective HMB backup on NVM for low power mode
CN117642716A (en) Recovery from HMB loss
CN111949434B (en) RAID management method, RAID controller and system
CN109343986B (en) Method and computer system for processing memory failure
CN113535459B (en) Data access method and device for responding to power event
CN118260119B (en) Memory fault processing method and device, electronic equipment, medium and chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21830029

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21830029

Country of ref document: EP

Kind code of ref document: A1