WO2021259351A1

WO2021259351A1 - Reset system, data processing system, and related device

Info

Publication number: WO2021259351A1
Application number: PCT/CN2021/102029
Authority: WO
Inventors: 刁阳彬; 韩林
Original assignee: 华为技术有限公司
Priority date: 2020-06-24
Filing date: 2021-06-24
Publication date: 2021-12-30
Also published as: CN113835923A

Abstract

The embodiments of the present application disclose a reset system, a data processing system, and a related device, and the method can be applied in the field of managing memory data. The reset system comprises a reset control circuit, a processor core, and a first register; failure replacement information recorded by the first register comprises location information of a first storage unit, and the first storage unit is a storage unit that experiences a failure when performing failure replacement on storage units in a memory. The reset control circuit responds to an obtained reset signal and sends a reset instruction to a second module. The second module comprises a processor core and does not comprise a first module. The new concept of failure replacement information is proposed, and a first register dedicated to recording failure replacement information is added. After a reset operation is complete, the data in a memory can be correctly accessed according to the failure replacement information, thus the data in the memory will not be lost when using failure replacement technology and reset technology in the memory.

Description

A reset system, data processing system and related equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010588804.3, and the invention title is "a reset system, data processing system and related equipment", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of computer technology, in particular to a reset system, a data processing system and related equipment.

Background technique

As the memory capacity increases and the memory speed increases, the memory failure rate continues to increase. When the storage unit in the memory fails, if the failed storage unit is not processed in time, it is easy to cause uncorrectable errors (UCE) such as system downtime, which in turn will result in the return of hardware. At present, before memory UCE occurs, it is possible to reduce the probability of UCE in the memory by performing fault replacement processing on the storage unit that is faulty in the memory. Among them, failure replacement processing refers to writing data in a failed storage unit in the memory into a backup storage unit in the memory, so as to realize the isolation of the failed storage unit.

However, after the faulty replacement of the storage unit in the memory, the data distribution in the memory will change, so if the system is reset in the future, the data in the memory will not be correctly accessed after the reset, resulting in the memory data lost.

Therefore, how to achieve the non-loss of data in the memory under the premise of using the fault replacement technology and the reset technology in the memory has become an urgent problem to be solved.

Summary of the invention

This application provides a reset system, data processing system and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process, thereby After the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.

In order to solve the above technical problems, this application provides the following technical solutions:

In the first aspect, this application provides a reset system that can be used in the field of managing memory data. The reset system includes a reset control circuit, a processor core and a first module. Wherein, the first module includes a first register, the first register is used to store failure replacement information, one failure replacement information includes location information of a first storage unit corresponding to a failure replacement operation, and the first storage unit is in the memory When the storage unit is replaced with a fault, that is, the first storage unit is a storage unit that is replaced when the storage unit in the memory is replaced with a fault; the first register may specifically be a status register or a configuration register. The reset control circuit is used to obtain the hot reset signal, and in response to the obtained hot reset signal, send a reset instruction to the second module. The second module includes the processor core and does not include the first module; that is, the reset control circuit sends a reset instruction to the processor The processor core sends a reset command, but does not send a reset command to the first module. The reset instruction is used to trigger the execution of the reset operation, so that after the reset operation is completed, the fault replacement information in the first register is not cleared. Reset refers to restoring the state of the reset module/unit/device to the state when it was powered on for the first time. The warm reset signal is used to trigger the warm reset operation. The reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal. In this implementation, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, so that After the reset operation is completed, the failure replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to failure replacement processing of the failed storage unit in the memory, after the system is reset, it can be based on the aforementioned Fault replacement information to understand which storage units in the memory are isolated faulty storage units to avoid system downtime due to access to the isolated faulty storage units, that is, to achieve correct access to the memory to achieve failures in the use of the memory Under the premise of replacement technology and reset technology, the data in the memory will not be lost.

In a possible implementation of the first aspect, one or more failure replacement information is recorded in the first module, and one failure replacement information further includes location information of the second storage unit corresponding to a failure replacement operation. The second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced, that is, the second storage unit is a storage unit that is replaced when the storage unit in the memory is faulty replaced.

In this implementation, the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect the current Which storage units in the memory have been replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.

In a possible implementation of the first aspect, a failure replacement information also includes a failure replacement type corresponding to a failure replacement operation, and the failure replacement type can be any of the following: memory module replacement, memory surface replacement, Memory particle replacement, memory block replacement, memory row replacement and memory storage cell replacement.

In a possible implementation of the first aspect, the granularity of the first storage unit with the fault is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars. Among them, the memory storage cell is the storage unit with the smallest granularity in the memory. A memory row includes a row of memory storage cells, a memory row includes multiple memory storage cells, a memory block includes multiple memory rows, and a memory particle includes multiple memory cells. A memory bank, a memory plane includes multiple memory particles, and a memory bank includes one or two memory planes. In this implementation, the granularity of the storage unit in the memory can be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect any of the foregoing granularities. Fault replacement operation, that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.

In a possible implementation of the first aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is used to store a faulty storage unit when the at least one first storage unit is a memory storage unit. In the case of a cell, it is stored as the data in the first storage cell of the memory storage cell. Further, the granularity of the second storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of second storage units in the first module may be 32, 64, or 128. . In the embodiment of the present application, when there is a first storage unit whose granularity is a memory storage cell in at least one first storage unit, the data in the faulty memory storage cell will be It is written into the backup storage unit in the first module. Since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thus ensuring Data integrity.

In a possible implementation of the first aspect, the system includes a memory controller, and the first module is integrated in the memory controller. Then the reset control circuit is specifically used to send a reset instruction to the processor core, and does not send a reset instruction to the memory controller. In this implementation, because the failure replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory, the first module is integrated into the memory controller to facilitate memory control The management of the first module by the memory controller also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, avoiding the problem of non-synchronization between different modules in the memory controller after reset .

In a possible implementation of the first aspect, the reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired. Among them, the cold reset signal is used to trigger the cold reset operation. The cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off. In this implementation, since the reset control circuit obtains the cold reset signal, it is proved that the reset operation is triggered because the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be Clear, so that there is no need to ensure that the data in the memory is not lost. The first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset system In sync.

In a possible implementation of the first aspect, the reset control circuit may include a logic circuit. When the reset control circuit obtains the hot reset signal, the output terminal of the reset control circuit is not coupled with the first module; When the reset control circuit obtains the cold reset signal, the output terminal of the reset control circuit is coupled with the first module.

In a possible implementation of the first aspect, the reset control circuit is further configured to send a first instruction to the first module, and the first instruction instructs the first module not to perform a reset operation.

In the second aspect, this application provides a data processing system that can be used in the field of managing memory data. The data processing system includes a processor core and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The storage unit in the internal memory is faulty when it is replaced. The processor core is used to obtain the fault replacement information from the first register and write the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not available when the processor core and the first module perform a reset operation. Lost. In this implementation, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system. After the memory controller writes the fault replacement information to the first module, the processor The kernel writes the newly generated fault replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the fault replacement information, even if the faulty storage unit in the memory is replaced by a fault in the memory. Part of the storage units are isolated and replaced. After the system is reset, it is possible to know which storage units in the memory are isolated faulty storage units based on the aforementioned fault replacement information, so as to avoid system downtime due to access to the isolated faulty storage unit. That is to say, the memory can be accessed correctly, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.

In a possible implementation manner of the second aspect, the fault replacement information further includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is faulty replaced.

In a possible implementation of the second aspect, the granularity of the faulty storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.

In a possible implementation of the second aspect, the system includes a memory controller, and the first module is integrated in the memory controller.

In a possible implementation of the second aspect, the processor core is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and reset in the first register In the process, the fault replacement information set is backfilled to the first register. Wherein, the failure replacement information set includes at least one failure replacement information. In this implementation, the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system. , The memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.

In a possible implementation of the second aspect, the first module further includes at least one second storage unit, and the second storage unit in the first module is used when the at least one first storage unit is a memory storage unit , Is stored as the first data in the first storage unit of the memory storage unit. The processor core is also used to obtain the first data from the second storage unit in the first module, and write the first data into the non-volatile storage medium, so that the processor core and the first module are reset During operation, the first data is not lost. The processor core is also used to obtain the fault replacement information collection and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, and collect the fault replacement information during the reset process of the first module Backfill to the first register, and backfill the first data to the second storage unit in the first module, wherein the failure replacement information set includes at least one failure replacement information. In this implementation, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.

In a possible implementation of the second aspect, the processor core is also used to perform a reset operation on the first module to initialize the first module, and when the reset operation is a warm reset operation, from the non-volatile storage The failure replacement information collection is obtained from the medium, and the reverse replacement operation is performed on the data in the storage unit of the memory according to the failure replacement information collection. Wherein, the failure replacement information set includes at least one failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to The initial state. Further, restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented.

In this implementation, due to a processor core failure or a memory controller failure, a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory meets the The storage unit under the failure replacement condition may become a usable storage unit again, so after resetting the processor core and the memory controller, perform reverse replacement operation on the data in the memory storage unit, that is, release the backup The storage unit helps to extend the service life of the memory.

In a possible implementation manner of the second aspect, the processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.

In a possible implementation of the second aspect, the processor core is further configured to not obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a cold reset operation.

For the concept of nouns in the second aspect of this application and some possible implementations of the second aspect, the specific implementation steps, and the beneficial effects brought by each possible implementation, you can refer to the various possible implementations in the first aspect The description of each is not repeated here.

In the third aspect, this application provides a reset method that can be used in the field of managing memory data. The method is applied to a reset system. The system includes a reset control circuit, a processor core, and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. , The first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The reset control circuit obtains the hot reset signal; the reset control circuit responds to the obtained hot reset signal and sends a reset instruction to the second module. The second module includes the processor core and does not include the first module. The reset instruction is used to trigger the reset operation .

The third aspect of this application is also used to execute the steps in the various implementations of the first aspect, the specific implementation steps of the third aspect of the application and the various possible implementations of the third aspect, and the implementation of each possible implementation. For the beneficial effects of the above, reference may be made to the descriptions in the various possible implementation manners in the first aspect, which will not be repeated here.

In the fourth aspect, this application provides a data processing method that can be used in the field of managing memory data. The method is applied to a data processing system. The data processing system includes a processor core and a first module. The first module includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The processor core obtains the fault replacement information from the first register; the processor core writes the fault replacement information into the non-volatile storage medium, so that the fault replacement information is not lost when the processor core and the first module perform a reset operation .

The fourth aspect of the application is also used to execute the steps in the various implementations of the second aspect, the specific implementation steps of the fourth aspect and the various possible implementations of the fourth aspect of the application, and the implementation of each possible implementation. For the beneficial effects of, refer to the descriptions in the various possible implementation manners in the second aspect, which will not be repeated here.

In a fifth aspect, the present application provides a computer device configured with the reset system described in the first aspect above, or configured with the data processing system described in the second aspect above.

In a sixth aspect, the present application provides a chip system including a processor for supporting the realization of the functions involved in the above aspects, for example, sending or processing the data and/or information involved in the above methods. In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the server or the communication device. The chip system can be composed of chips, and can also include chips and other discrete devices.

Description of the drawings

FIG. 1 is a schematic structural diagram of a reset system provided by an embodiment of this application;

FIG. 2 is a schematic diagram of a work flow of the reset system provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a fault replacement technique in the reset method provided by an embodiment of the application;

FIG. 4 is a system schematic diagram of a reset system provided by an embodiment of this application;

FIG. 5 is a schematic diagram of a workflow of a data processing system provided by an embodiment of the application;

FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application;

FIG. 7 is a system schematic diagram of a reset system provided by an embodiment of the application;

FIG. 8 is a schematic diagram of another system of the reset system provided by an embodiment of the application;

FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of this application;

FIG. 10 is a schematic diagram of another system of the data processing system provided by an embodiment of the application;

FIG. 11 is a schematic diagram of a structure of a computer device provided by the implementation of this application.

detailed description

The embodiment of the application provides a reset system, a data processing system, and related equipment, proposes a new concept of fault replacement information, and adds a first register that specifically records fault replacement information to ensure that the fault replacement information is not lost during the reset process Therefore, after the reset operation is completed, the data in the memory can be correctly accessed according to the fault replacement information, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.

The terms "first", second, etc. in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way It can be interchanged under appropriate circumstances. This is only the way of distinguishing objects with the same attribute in the description of the embodiments of this application. In addition, the terms "including" and "having" and any variations of them are intended to be Covering non-exclusive inclusion, so that the process, method, system, product or equipment containing a series of units is not necessarily limited to those units, but may include other units that are not clearly listed or are inherent to these processes, methods, products or equipment .

The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

The reset system provided by the embodiment of the present application is mainly applied to a device that processes memory data. In order to facilitate the understanding of this solution, the embodiment of the application first introduces the reset system provided by the embodiment of the application with reference to FIG. 1. Please refer to FIG. 1 first. FIG. 1 can also be regarded as a schematic structural diagram of the data processing system provided by the embodiment of the present application. The reset system includes a processor and a memory, and the processor and the memory can be configured in any form of electronic equipment. The processor integrates a processor core (core), a reset control circuit, a memory controller (double data rate sdram controller, DDRC), and a high-speed physical interface transceiver (high-speed physical layer, HSPHY).

Among them, a software system can be mounted on the processor core to provide basic functions of the operating system. The reset control circuit is used to trigger a module or unit in the processor to perform a reset operation, and is also used to trigger a memory to perform a reset operation.

The memory controller is used to convert the address in the access request issued by the processor core to the physical address in the memory, and transfer the access request to HSPHY. It is also used to efficiently schedule the access request issued by the processor core. It is also used to perform fault replacement operations on the storage unit in the memory.

HSPHY communicates with the memory outside the processor, and is used to obtain the digital signal generated by the memory controller, and convert the digital signal into an electrical signal, and then transmit it to the memory; it is also used to obtain the electrical signal generated by the memory and convert the electrical signal It is a digital signal and then transmitted to the memory controller.

It should be noted that in actual application products, the processor may also include more or fewer modules or units. In addition, the memory controller may not be integrated into the processor, that is, the memory controller and the processor are Two independent devices, Figure 1 is only an example to facilitate understanding of the application environment commission of this solution, and is not used to limit this solution.

Based on the above description. The embodiment of the present application provides a reset system. The reset system adds a first module for storing fault replacement information. The fault replacement information is used to reflect the distribution of data in the memory, so as to ensure that the reset system is reset. After that, the fault replacement information is not lost. After the system is reset, the data in the memory can still be accessed correctly according to the aforementioned fault replacement information, so as to realize the restoration of the data in the memory under the premise of using the fault replacement technology and reset technology in the memory. Not lost. Specifically, in one implementation, during the resetting process of the reset system, the first module is controlled not to perform the reset operation, so as to avoid the loss of fault replacement information; in another implementation, the fault in the first module The replacement information is written into a non-volatile storage medium other than the reset system, so that the reset of the reset system will not cause the loss of fault replacement information, but the specific operation methods of the foregoing two cases are quite different, which will be introduced separately as follows .

1. Failure replacement information is not cleared

In the embodiment of the present application, please refer to FIG. 2. FIG. 2 is a schematic diagram of a work flow of the reset system provided in the embodiment of the present application. The work flow of the reset system provided in the embodiment of the present application may include:

201. The processor core sends a fault replacement instruction to the memory controller.

In the embodiment of the present application, during the operation of the reset system, the processor core can obtain in real time whether the memory is faulty. When a memory cell in the memory fails, the memory controller performs error correction according to the instruction. Checking and correcting, ECC) algorithm, to correct the data in the memory cell, if the error can be corrected successfully, the current fault is reported to the processor core as a correctable error (CE) error, and then generated and recorded Descriptive information corresponding to the CE error.

Among them, the memory may include one or more memory modules, a memory module may include one or two memory planes (rank), a memory plane may include multiple memory particles (devices), and a memory particle may include multiple memory blocks. (bank), a memory block can include multiple memory rows (row), and a memory row can include multiple memory storage cells (cell). It should be noted that the foregoing is a comparison of the size of the storage space in the memory. The memory is divided. In actual situations, the memory can also be divided from other angles.

The description information includes at least the location information where the CE error occurs. The location information where the CE error occurred is used to indicate the location of the storage unit in the memory where the CE error occurred, that is, the location information in the description information can indicate which storage unit in the memory has the CE error. The in-memory storage unit in the embodiment of the present application may specifically refer to one or more of the following: a memory storage cell, a memory row, a memory block, a memory particle, a memory surface, and a memory bar. Optionally, the description information may also include the CE error type. The types of CE errors include, but are not limited to, CE errors generated when the processor core accesses the memory, CE errors generated during periodic memory inspections, or other types of CE errors, etc., which are not exhaustive here. In the embodiments of the present application, the granularity of the storage unit may be any of memory storage cells, memory rows, memory blocks, memory particles, memory planes, or memory bars, that is, the fault replacement information can reflect the faults of any of the foregoing granularities. Replacement operation, that is, this solution supports any kind of granular fault replacement operation, which improves the implementation flexibility of this solution.

After the processor core obtains the description information corresponding to the CE error, it can determine the specific location in the memory where the CE error occurs, and then determine whether the storage unit in the memory meets the failure replacement condition. When the processor core determines that the fault replacement condition is met, it sends a fault replacement instruction to the memory controller, and when the fault replacement condition is not met, it can continue to monitor the memory.

Among them, fault replacement can be divided into multiple types according to different granularities: memory module replacement, memory surface replacement, memory particle replacement, memory block replacement, memory row replacement, and memory storage cell replacement.

Correspondingly, the failure replacement conditions may include memory module failure replacement conditions, memory plane failure replacement conditions, memory particle failure replacement conditions, memory block failure replacement conditions, memory row failure replacement conditions, and memory storage cell failure replacement conditions. Further, the memory module failure replacement condition may specifically be that the number of CE errors occurring in the same memory module is greater than or equal to the first preset threshold, or the memory module failure replacement condition may specifically be that the same memory module has the same type of CE error occurring more than the number of times. Or equal to the second preset threshold, etc., the memory module failure replacement condition may also be other conditions. The values of the first preset threshold and the second preset threshold can be flexibly set with reference to the actual situation, and are not limited here. The meanings of the memory surface replacement conditions, the specific memory particle replacement conditions, the memory block failure replacement conditions, the memory row failure replacement conditions, and the memory storage cell failure replacement conditions are similar to the meanings of the aforementioned memory module failure replacement conditions. You can refer to the foregoing description for understanding , Do not repeat it here.

The fault replacement instruction carries at least the location information of the faulty storage unit and the location information of the backup storage unit. The location information of the faulty storage unit can be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit; the location information of the backup storage unit can also be expressed as a character string, and the aforementioned character string is the code of the replaced storage unit. . Optionally, the fault replacement instruction may also carry a fault replacement type. The aforementioned fault replacement type can also be expressed as a character code. As an example, for example, 00 represents the fault replacement type is memory block replacement, and 01 represents the fault replacement type is memory plane. Replacement, etc., will not be exhaustive here.

202. The memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.

In the embodiment of the present application, when the memory controller receives the fault replacement instruction, it can know which storage unit in the memory needs to be replaced and isolated according to the location information of the faulty storage unit and the location information of the backup storage unit, and the backup after replacement. The location of the storage unit. Furthermore, failure replacement processing is performed on the storage unit in the memory. In a failure replacement operation, the memory controller reads out the data in the failed storage unit and writes it into the backup storage unit. Among them, the memory controller may be integrated in the processor, or may be a separate device from the processor. Optionally, in a failure replacement operation, the memory controller also needs to reorganize the storage unit in the memory.

It should be noted that the faulty storage unit must be located in the memory, but the backup storage unit is not necessarily located in the memory. When the granularity of the faulty storage unit is a memory storage cell, a backup storage unit for storing data in the faulty memory storage cell may be integrated in the first module. That is, when the granularity of a certain faulty storage unit is a memory storage cell, the memory controller writes the data in the faulty memory storage cell into the backup storage unit in the first module. When the granularity of the faulty storage unit is a memory row, a memory block, a memory particle, a memory surface, or a memory bar, the corresponding backup storage unit can be set in the memory.

In order to understand the processing process of the fault replacement technology more intuitively, please refer to FIG. 3, which is a schematic diagram of the fault replacement technology in the reset method provided by the embodiment of the application. In Figure 3, the storage unit that needs to be replaced with failure is a memory particle as an example. Figure 3 includes three sub-schematics (a), (b) and (c). The sub-schematic diagram (a) represents the data distribution in the memory bank before the failure replacement operation. As shown in the sub-schematic diagram (a), a memory The bar includes two memory planes (Rank A and Rank B respectively). Each memory plane includes 18 memory particles. The 18 memory particles include 16 particles for normal data storage, as well as an ECC particle and parity. Check bit particles. The ECC error correction particles can also be regarded as backup particles. When a fault particle that meets the fault replacement condition occurs in particles 0 to 15, the data in the fault particle is written into the ECC error correction particle to Realize the replacement and isolation of faulty particles. (b) The sub-schematic diagram indicates that particle 1 in Rank A is a faulty particle, and the data of particle 1 in Rank A needs to be written into the ECC error correction particle of Rank A, but in this way, Rank A loses its error correction ability. Rank A still has the ability to correct errors after replacing the particles. Rank A and Rank B can be reorganized so that Rank A and Rank B share an ECC error correction particle of Rank B, which is represented by the sub-schematic diagram (a) The two 16+2 storage modes in (c) have become the 32+3 mode in the sub-schematic diagram in (c). It should be understood that the example in FIG. 3 is only to facilitate understanding of the failure replacement technology, and is not used to limit the solution.

After performing fault replacement processing on the storage unit in the memory, the memory controller writes the fault replacement information into the first register. Among them, there may be multiple failure replacement operations during the operation of the reset system, and one failure replacement information is used to record one failure replacement operation. The failure replacement information includes the location information of the first storage unit and the location information of the second storage unit. The first storage unit is the storage unit that has a CE error when the storage unit in the memory is replaced with a failure (that is, when the failure is replaced The storage unit replaced later), and the second storage unit is a backup storage unit when the storage unit in the memory is replaced by a failure (that is, the storage unit used after the failure is replaced). Therefore, the failure replacement information can reflect the distribution of the data in the memory in the storage unit after the failure replacement processing is performed on the storage unit that has failed in the memory.

Optionally, a piece of fault replacement information may also include the granularity level of the faulty storage unit, the CE error type of the replaced faulty storage unit, or other types of information, etc. In the embodiment of the present application, the fault replacement information includes at least the location information of the replaced storage unit and the location information of the replaced storage unit, that is, the fault replacement operation that occurs in the memory is recorded through the fault replacement information, which can not only reflect Which storage units in the current memory are replaced and isolated due to faults also reflects which storage unit the data after the fault replacement is stored in, which intuitively reflects the distribution of the data in the memory in the storage units.

In order to further understand the concept of fault replacement information, the following takes the granularity level of the storage unit for fault replacement as a memory block as an example, and further introduces the fault replacement information through Table 1.

Table 1

Among them, region 0 in Table 1 refers to that the failure replacement operation corresponding to the failure replacement information occurred in the region numbered 0 in the memory (region). region0-enable is a field in the fault replacement information used to indicate whether a fault replacement has been performed in region0, and the code 0 in region0-enable indicates that a fault replacement has been performed in region0. The region0-size is a field in the failure replacement information used to indicate the granularity of the storage unit to be replaced in region0. The code 00 in the region0-size indicates that the granularity of the storage unit to be replaced in the region0 is a bank. region0-rank indicates the number of the memory plane where the memory block that needs to be replaced in region0 is located, region0-device indicates the number of the memory particle where the memory block that needs to be replaced in region0 is located, and region0-bank indicates the number of the memory block that needs to be replaced in region0. The number of the memory block, region0-rank, region0-device, and region0-bank collectively indicate the location of the first storage unit, as shown in Table 1, the first storage unit is the number 5 in the memory plane numbered 10 in region0 The memory block numbered 14 in the memory granule. region0-buddy-rank indicates the number of the memory plane where the backup memory block is located, region0-buddy-device indicates the number of the memory particle where the backup memory block is located, and region0-buddy-bank indicates the number of the memory block in region0 that needs to be replaced by failure. region0-buddy-rank, region0-buddy-device, and region0-buddy-bank collectively indicate the location of the second storage unit, as shown in Table 1, the second storage unit is the number 13 in the memory plane numbered 18 in region0 The memory block numbered 22 in the memory granule. It should be understood that, in actual situations, the fault replacement information may include more or less information, and the examples in Table 1 are only to facilitate understanding of the concept of fault replacement information, and are not used to limit the solution.

The first register belongs to the first module, and fault replacement information is stored in the first register. The first register may specifically be represented as a status register, a configuration register, or other types of registers, etc., which is not limited here. The first module can be integrated in the memory controller. Further, one first register stores one fault replacement information, and the first module can be configured with multiple first registers to record multiple fault replacement information.

Optionally, the first module may also be configured with at least one second storage unit, and the second storage unit in the first module is used for the case where the at least one first storage unit is a faulty storage unit of the memory storage unit, Store the data in the first storage unit as the memory storage unit. In the embodiment of the present application, since the reset instruction is not sent to the first module during the reset process, it is ensured that the data in the backup storage unit will not be cleared after the reset operation is completed, thereby ensuring the integrity of the data.

Further, the granularity of the backup storage unit configured in the first module may be a memory storage cell, a memory row, or other granularity, etc., and the number of backup storage units in the first module may be 32, 64, 128, or others. Quantity etc.

Optionally, a second register may also be configured in the reset system. The second register is used to record the state of the memory controller in the fault replacement operation. The aforementioned status may include no fault replacement operation, fault replacement operation in progress, and fault replacement. Operation success, failure replacement operation failure, or other types of status, etc., are not limited here. Further, the second register can also be integrated in the memory controller. Further, one or more sets of registers may be configured in the reset system, and each set of registers includes a first register and a second register.

203. The reset control circuit obtains the reset signal.

In the embodiment of the present application, a reset operation may be required during the operation of the reset system, so that the reset control circuit can obtain the reset signal. Among them, reset refers to restoring the state of the reset module/unit/device to the state of power-on for the first time. The reset control circuit can be integrated in the processor. Optionally, after receiving the reset signal, the reset control circuit can determine whether the received reset signal is a warm reset signal or a cold reset signal. Among them, the cold reset signal is generally caused by a memory failure and is used to trigger a cold reset operation. The cold reset operation refers to the need to restore the entire reset system and the memory to the first power-on state, which can generally be performed by powering on and off. The warm reset signal is generally caused by a non-memory fault and is used to trigger a warm reset operation. The warm reset operation refers to not resetting some modules/units/devices during the resetting process of the system.

Specifically, in an implementation manner, the reset control circuit may include a first pin and a second pin. If the reset control circuit is a reset signal obtained from the first pin, the reset control circuit obtains It is a cold reset signal; if the reset control circuit is a reset signal obtained from the second pin, the reset control circuit obtains a hot reset signal. In this implementation manner, the reset signal may be represented as a group of low-level signals, and the aforementioned group of low-level signals may include one or more low-level signals. In another implementation, the reset control circuit obtains the cold reset signal and the warm reset signal from the same signal source, and the cold reset signal and the warm reset signal are specifically represented as different electrical signals. As an example, the cold reset signal is represented as 01 signal, or 0101 signal, or 0011 signal, the hot reset signal is specifically represented as 10 signal, or 1010, or 1100, etc., "0" refers to a low level signal, and "1" refers to a high level signal. Therefore, the reset control circuit can determine whether it is a cold reset signal or a warm reset signal according to the form of the received electrical signal. It should be understood that the examples of the cold reset signal and the warm reset signal here are only to facilitate the understanding of the solution, and are not used to limit the present solution. plan.

204. The reset control circuit sends a reset instruction to the second module. The second module includes the processor core and does not include the first module.

In some embodiments of the present application, after obtaining the reset signal, the reset control circuit controls the processor core to perform the reset operation in response to the obtained reset signal, and controls the first module not to perform the reset operation, and also controls the memory not to perform the reset operate. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the first module. Optionally, the second module may also include other modules in the reset system except the first module, as long as it is ensured that the first module and the memory do not perform a reset operation. It should be noted that the concept of the second module can be an artificially divided conceptual module.

Specifically, in response to the acquired reset signal, the reset control circuit sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory. The reset instruction is used to trigger the execution of the reset operation, so as to control the processor core to execute the reset. And control the first module not to perform a reset operation, so that the data stored in the first module is not reset, that is, the data stored in the first module is not cleared. Wherein, the reset instruction may be a set of low-level signals, and the set of low-level signals includes at least one low-level signal; the reset instruction may also be a set of electrical signals including both a low-level signal and a high-level signal. Etc., it is not limited here.

Further, when the acquired reset signal is a hot reset signal, the reset control circuit sends a reset instruction to the processor core, but does not send a reset instruction to the first module and the memory. When the cold reset signal is obtained, a reset instruction is sent to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.

Optionally, the reset control circuit may also send a first instruction to the first module and the memory respectively, and the first instruction instructs not to perform the reset operation. Therefore, the processor core performs the reset operation after receiving the reset instruction, and the first module and memory do not perform the reset operation after receiving the first instruction, so as to control the processor core to perform the reset operation and control the first module and the memory not to execute Reset operation.

Specifically, the reset control circuit sends the first instruction to the first module. In one case, the reset instruction and the first instruction can be expressed as two different electrical signals, so the reset control circuit can send different electrical signals to the processor core and the first module to send to the processor core The reset instruction sends the first instruction to the first module. Correspondingly, the first module can determine whether the received first instruction is based on the type of the received electrical signal. As an example, for example, the reset command is 111000, the first command is 000111, "0" refers to a low-level signal, and "1" refers to a high-level signal. In another case, a third pin and a fourth pin may be provided in the first module. If the reset control circuit wants to send a reset command to the first module, the reset control circuit sends a command to the third pin; Correspondingly, if the first module is an instruction acquired through the third pin, it is deemed that the acquired instruction is a reset instruction. If the reset control circuit wants to send the first command to the first module, the reset control circuit sends the command to the fourth pin; correspondingly, if the first module is the command obtained through the fourth pin, it is deemed to be obtained Is the first instruction.

The implementation manner of the reset control circuit sending the first instruction to the memory is similar to the implementation manner of the reset control circuit sending the first instruction to the first module, and will not be repeated here.

Further, the reset control circuit sends a reset instruction to the processor core and a first instruction to the first module and the memory when the acquired reset signal is a hot reset signal. When the acquired reset signal is a cold reset signal, the reset control circuit sends a reset instruction to the processor core, the first module and the memory. That is, only when the reset control circuit obtains the hot reset signal, the first module and the memory are controlled not to perform the reset operation.

Further optionally, if the first module is integrated in the memory controller, step 204 may include: after obtaining the reset signal, the reset control circuit sends a reset instruction to the processor core in response to the obtained reset signal, and does not send a reset instruction to the processor core. The memory controller sends a reset command. That is, the reset control circuit sends a reset instruction to the second module, and the second module includes the processor core and does not include the memory controller. The specific implementation of the reset control circuit controlling the processor core to perform the reset operation is the same as the above description. The specific implementation of the reset control circuit controlling the memory controller not to perform the reset operation is similar to the above description, except that the execution object in the above description is The first module, the execution object in this implementation is the entire memory controller, which will not be repeated here. In the embodiment of the present application, since the failure replacement information recorded in the first module indicates the distribution of data in the memory in the storage unit, and the memory controller is used to manage the memory, the first module is integrated into the memory controller to facilitate the memory The controller's management of the first module also facilitates the memory controller to read the failure replacement information to manage the memory; in addition, directly control the entire memory controller without resetting, to avoid the occurrence of asynchronization between different modules in the memory controller after reset problem.

205. The reset control circuit sends a reset instruction to the processor core and the first module.

In some embodiments of the present application, when the reset control circuit determines that the acquired reset signal is a cold reset signal, the first instruction is sent to the processor core, the first module, and the memory to control the processor core and the first module. And the memory performs a reset operation. Optionally, the reset control circuit may also send a reset instruction to other modules in the reset system.

Further, the reset control circuit may include a logic circuit. When the reset control circuit obtains a warm reset signal, the output terminal of the reset control circuit is not coupled with the first module; when the reset control circuit obtains a cold reset When the signal is applied, the output terminal of the reset control circuit is coupled with the first module.

In the embodiment of this application, since the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be It is cleared so that there is no need to ensure that the data in the memory is not lost. The first module will also perform the reset operation, so that after the reset operation is completed, new fault replacement information can be written to the first module again to ensure the entire reset The system is in sync.

Optionally, the first module is integrated in the memory controller, and when the reset control circuit determines that the acquired reset signal is a cold reset signal, the reset control circuit controls the processor core and the memory controller to perform the reset operation, and the reset control circuit Will control the memory to perform a reset operation. The specific implementation is similar to the above, with the difference that the first module in the above description is replaced with a memory controller, which is not repeated here.

To further understand this solution, please refer to FIG. 4, which is a system schematic diagram of a reset system provided by an embodiment of this application. Figure 4 takes the first module integrated in the memory controller and the memory controller integrated in the processor as an example. In the case that the reset control circuit obtains a cold reset signal, the reset control circuit sends a reset instruction to the processor core, the memory controller, HSPHY and the memory to trigger the entire reset system and the memory to perform a reset operation. In the case that the reset control circuit obtains a hot reset signal, the reset control circuit sends a reset instruction to the processor core, and does not send a reset instruction to the memory controller, HSPHY and memory, so as to control the first module not to perform the reset operation. It is understood that the example in FIG. 4 is only for a more intuitive understanding of the solution, and is not used to limit the solution.

In the embodiment of this application, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system; during the reset operation, the first module is controlled not to reset, thereby After the reset operation is completed, the fault replacement information in the first register can not be reset, even if part of the storage unit in the memory is isolated and replaced due to fault replacement processing of the fault storage unit in the memory, after the system is reset, it can be based on The aforementioned fault replacement information understands which storage units in the memory are isolated faulty storage units, so as to avoid system downtime caused by accessing the isolated faulty storage units, that is, to achieve correct access to the memory, so as to realize the use of the memory. Under the premise of fault replacement technology and reset technology, the data in the memory will not be lost.

Two, backup failure replacement information

In the embodiment of this application, please refer to FIG. 5. FIG. 5 is a schematic diagram of a work flow of the data processing system provided in the embodiment of this application. The work flow of the data processing system provided in the embodiment of this application may include:

501. The processor core sends a fault replacement instruction to the memory controller.

502. The memory controller performs fault replacement processing on the storage unit in the memory according to the received fault replacement instruction, and writes the fault replacement information into the first register.

In the embodiment of the present application, the specific implementation manners of steps 501 and 502 are similar to the specific implementation manners of

steps

201 and 202 in the embodiment corresponding to FIG.

503. The processor core writes the fault replacement information into the non-volatile storage medium.

In some embodiments of the present application, after the memory controller writes the fault replacement information into the first register in the first module, the processor core can read the fault replacement information from the first register in the first module, and Write the newly generated failure replacement information into the non-volatile storage medium. Among them, the concepts of the first module and the fault replacement information have been introduced in the embodiment corresponding to FIG. 2 and will not be repeated here. The non-volatile storage medium may specifically be a hard disk, a complex programmable logic device (CPLD), an electrically erasable programmable read only memory (EEPROM), or other types of non-volatile storage media. The non-volatile storage medium and the processor core may be configured in the same device, or may be configured in a different device from the processor core. The processor core and the non-volatile storage medium can communicate data through an internal interface or an external interface. The internal interface includes but is not limited to a bus, and the external interface includes a wired communication interface and a wireless communication interface.

Specifically, the processor core reads the fault replacement information from the first register. After the memory controller writes the fault replacement information into the first register, it will show the signal of completing the fault replacement technology to the processor core, and the processor core reads the fault replacement information from the first module after learning the completion signal.

More specifically, referring to the description in step 201 in the corresponding embodiment in FIG. 2, a second register is configured in the data processing system. After the memory controller writes the fault replacement information into the first module, the fault is written in the second register. The information that the replacement operation is successful (that is, the signal that shows the completion of the failure replacement technology), the processor core, after reading the information in the second register, determines that the memory controller has completed the failure replacement operation, and copies it from the first register Fault replacement information.

504. The processor core writes the first data stored in the second storage unit in the first module into the non-volatile storage medium.

In some embodiments of the present application, the first module may further include at least one second storage unit, and the second storage unit in the first module is used to store as The memory stores the first data in the first storage unit of the cell.

When the granularity of a certain faulty storage unit is a memory storage unit, the memory controller can write the first data in the faulty storage unit into the second storage unit (that is, the backup storage unit) in the first module, and then After the memory controller writes the fault replacement information into the first register in the first module, the processor core can read the first data from the second storage unit included in the first module, and write the first data into the nonvolatile In a sexual storage medium, the first data is not lost when the processor core and the first module perform a reset operation.

Specifically, the processor core reads the first data from the backup storage unit in the first module. After the memory controller writes the fault replacement information into the first register, it will show to the processor core a signal to complete the fault replacement technology. After the processor core learns the completion signal, it reads from the backup storage unit in the first module. Take the first data. The specific implementation manner in which the processor core determines that the memory controller has completed the fault replacement operation has been introduced in step 503, and will not be repeated here.

It should be noted that step 504 is an optional step. If there is no faulty storage unit whose granularity is a memory storage cell, step 504 does not need to be performed. If step 504 is performed, the embodiment of the present application does not limit the execution order between step 503 and step 504. Step 503 can be performed first, and then step 504; or step 504 can be performed first, and then step 503 can be performed at the same time. Steps 503 and 504.

505. The reset control circuit obtains a reset signal.

In the embodiment of the present application, the specific implementation of step 505 is similar to the specific implementation of step 203 in the embodiment corresponding to FIG.

506. The reset control circuit sends a reset instruction to the processor core and the first module.

In some embodiments of the present application, after the reset control circuit obtains the reset signal, regardless of whether the obtained reset signal is a warm reset signal or a cold reset signal, the reset control circuit sends a reset instruction to the processor core and the first module to trigger processing The processor core and the first module perform a reset operation. Further, if the acquired signal is a warm reset, the reset control circuit does not send a reset instruction to the memory to control the memory not to perform the reset operation; if the acquired signal is a cold reset, the reset control circuit sends a reset instruction to the memory to control The memory performs a reset operation. Among them, the manifestation of the reset command has been introduced in the embodiment corresponding to FIG. 2, and will not be repeated here. It should be noted that although the first module in FIG. 5 is integrated in the memory controller, in actual situations, the first module may also be provided outside the memory controller, which is not limited here.

Optionally, if the first module is integrated in the memory controller, and the memory controller is integrated in the processor, the entire data processing system may behave as a processor, and the reset control circuit may send the reset signal to the entire The processor sends a reset instruction to control the entire processor to perform a reset operation.

507. The processor core determines whether the reset operation is a warm reset operation, if it is a warm reset operation, go to step 508, and if it is a cold reset operation, go to step 510.

In some embodiments of the present application, a third register is further provided in the reset control circuit, and the third register is used to record whether the reset signal acquired by the reset control circuit this time is a cold reset signal or a warm reset signal. After receiving the reset instruction sent by the reset control circuit, the processor core queries the information recorded in the third register to determine whether the reset signal that triggered the reset operation is a hot reset signal, that is, whether the reset operation is hot Reset operation.

508. The processor core performs a reset operation on the processor core and the first module.

In some embodiments of the present application, initialization software is running in the processor core, and if it is determined to be a warm reset operation, the initialization software in the processor core needs to perform a reset operation on the processor core and the first module. During the resetting process, the initialization software in the processor core obtains the failure replacement information collection from the non-volatile storage medium. Since more than one failure replacement operation can occur during the operation of the data processing system, and one failure replacement information is used to record the replacement information of the storage unit in a failure replacement operation, the processor core obtains the information from the non-volatile storage medium. It can be a failure replacement information collection that includes one or more failure replacement information. Among them, the initialization software may specifically be expressed as a basic input output system (BIOS) system.

Optionally, if step 504 is performed, the initialization software in the processor core also obtains the first data from the non-volatile storage medium during the reset and startup process.

Specifically, for the process of the processor core performing the reset operation on the first module. In an implementation manner, the initialization software in the processor core backfills the failure replacement information set to the first register in the process of resetting the first register. In the embodiment of the present application, the processor core obtains the failure replacement information collection from the non-volatile storage medium, and directly backfills the failure replacement information to the first module during the resetting process of the first module, so as to realize the reset of the data processing system Later, the memory controller directly uses the fault replacement information in the first module to accurately access the memory, which is simple to operate and easy to implement.

More specifically, after the initialization software in the processor core triggers the reset operation on the processor core and the first module, the initialization software in the processor core performs the reset operation on the first register. Each failure replacement information is backfilled into a plurality of first registers respectively. Since the configuration register only supports hardware writing, and the status register supports both hardware writing and software writing, the first register is specifically represented as a status register in this implementation.

Optionally, if step 504 is performed, the initialization software in the processor core backfills the failure replacement information set to the first register and backfills the first data to the first module during the reset operation of the first module The second storage unit. The implementation manner of the processor core backfilling the first data to the second storage unit in the first module is similar to the implementation manner of backfilling the fault replacement information to the first register, and will not be repeated here. In this implementation, the first data stored in the second storage unit in the first module is also written into the nonvolatile storage medium. When the first module is reset, the first data is backfilled to the first module. In the module, to ensure that the first data is not lost, thereby ensuring the integrity of the data.

In one implementation, the initialization software in the processor core performs a reset operation on the first module to initialize the first module; and according to the failure replacement information set, performs reverse replacement operation on the data in the storage unit of the memory, reverse replacement The operation is used to write the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state. Among them, restoring the distribution of the data in the memory in the storage unit to the initial state does not mean clearing the data in the memory, but refers to storing the data in the memory according to the storage mode before the failure replacement technology is implemented. In the embodiments of the present application, due to a processor core failure or a memory controller failure, a certain storage unit in the memory may also meet the failure replacement condition, that is, after the processor core and the memory controller are reset, the memory The storage unit that meets the failure replacement conditions may become usable storage unit again, so after resetting the processor core and memory controller, perform reverse replacement operation on the data in the memory storage unit, which is also released The backup storage unit helps to extend the service life of the memory.

More specifically, the initialization software in the processor core performs a reset operation on the first module, so that after the first module is initialized, the failure replacement information set recorded in the first module is cleared. Since each failure replacement information records the replacement relationship between a first storage unit and a second storage unit, the initialization software in the processor core can learn the location of the first storage unit and the second storage unit based on the failure replacement information. The location of the unit, and then rewrite the data stored in a second storage unit to the first storage unit, that is, perform an inverse replacement operation on the data in the storage unit of the memory.

Further, as the granularity level of the faulty storage unit is memory particles, the initialization software in the processor core also needs to use the data in the parity check particles to verify the data in the second storage unit. If it is found in the second storage unit If there is an error in the data, use the data in the ECC error correction particles to correct the data in the second storage unit, and then rewrite the data in the second storage unit after the error correction process into the first storage unit .

Correspondingly, the reverse replacement operation also needs to reorganize the data in the memory particles.

To further understand this solution, please refer to FIG. 6. FIG. 6 is a schematic diagram of the reverse replacement operation in the data processing method provided by the embodiment of the application. Take Figure 3 as an example. Figure 6 includes (a) and (b) two sub-schematic diagrams. (a) sub-schematic diagram represents the data distribution in the memory bank before the reverse replacement operation, as shown in (a) sub-schematic diagram. After the fault replacement operation is performed, the data of particle 1 in Rank A is written into the ECC error correction particles of Rank A, and Rank A and Rank B share an ECC error correction particle of Rank B, and the reverse replacement operation is required Rewrite the data in the ECC error correction particle of Rank A into the particle 1 of Rank A. (b) The sub-schematic diagram represents the data distribution in a memory stick after the reverse replacement operation. After the processor core uses the parity bit particles of Rank A to verify the data in the ECC error correction particles of Rank A, it is found There is no error in the data in the ECC error correction particles of Rank A, and the data in the ECC error correction particles of Rank A are read and written into the particle 1 of Rank A. The processor core also reorganizes the data of Rank A and Rank B. , That is, the data storage mode in Rank A and Rank B is changed back to the two 16+2 storage modes, so that the distribution of the data in the memory in the storage unit is restored to the initial state. It should be understood that in Figure 6 The example is only to facilitate the understanding of the fault replacement technology, and is not used to limit the solution.

It should be noted that the embodiment of the present application does not limit the number of executions between steps 501 to 504 and steps 505 to 508, and may be that steps 505 to 508 are executed once after steps 501 to 504 are executed multiple times.

509. The processor core does not obtain the failure replacement information collection from the non-volatile storage medium.

In some embodiments of the present application, in the case that the processor core determines that this reset operation is a cold reset operation, the processor core no longer obtains the failure replacement information collection from the non-volatile storage medium, but directly reports to the processor The kernel, the first module, the memory controller, and the memory perform a reset operation, that is, initialize the entire data processing system.

In the embodiment of this application, since the reset control circuit obtains the cold reset signal, it is proved that the reason for triggering the reset operation is that the memory is faulty. At this time, the memory needs to be reset, that is, the data in the memory will be It is cleared, so there is no need to ensure that the data in the memory is not lost. In this case, the failure replacement information collection is no longer obtained from the non-volatile storage medium, avoiding redundant steps, and improving the efficiency of the reset process.

It should be noted that steps 507 and 509 are optional steps. If steps 507 and 509 are not executed, step 508 can be directly executed after step 505 is executed.

In the embodiment of this application, a new concept of fault replacement information is proposed, and a first register specially used to store fault replacement information is added to the reset system. After the memory controller writes the fault replacement information to the first module, the processing The processor core writes the newly generated failure replacement information into the non-volatile storage medium, so that the reset of the data processing system will not cause the loss of the failure replacement information, even if the failure replacement processing of the failed storage unit in the memory causes the memory Part of the storage units in the system are isolated and replaced. After the system is reset, it is possible to know which storage units in the memory are isolated faulty storage units based on the aforementioned fault replacement information, so as to avoid system downtime due to access to the isolated faulty storage unit , That is, it can realize the correct access to the memory, so that the data in the memory is not lost under the premise of using the fault replacement technology and the reset technology in the memory.

On the basis of the embodiments corresponding to FIG. 1 to FIG. 6, in order to better implement the above-mentioned solutions of the embodiments of the present application, related equipment for implementing the above-mentioned solutions is also provided below. For details, refer to FIG. 7, which is a schematic diagram of a reset system provided by an embodiment of the application. The reset system 700 may include a reset control circuit 701, a processor core 7021, and a first module 703. Among them, the first module 703 includes a first register, the first register is used to store failure replacement information, the failure replacement information includes location information of the first storage unit, and the first storage unit is present when the storage unit in the memory is replaced with failure. The faulty storage unit; the reset control circuit 701, used to obtain a hot reset signal; the reset control circuit 701, is also used to respond to the obtained hot reset signal, send a reset instruction to the second module 702, the second module 702 includes the processor core 7021, and does not include the first module 703, the reset instruction is used to trigger the execution of the reset operation.

In a possible design, the failure replacement information also includes location information of the second storage unit, and the second storage unit is a backup storage unit when the storage unit in the memory is replaced with a failure.

In a possible design, the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.

In a possible design, the first module 703 further includes at least one second storage unit, and the second storage unit in the first module 703 is used to store the The data in the first storage unit of the memory storage cell.

In a possible design, please refer to FIG. 8. FIG. 8 is a system schematic diagram of the reset system provided by an embodiment of the application. The reset system 700 includes a memory controller 704, and the first module 703 is integrated in the memory controller 704. The reset control circuit 701 is specifically configured to control the processor core 7021 to perform a reset operation, and control the memory controller 704 not to perform a reset operation.

In a possible design, the reset control circuit 701 is also used to send a reset instruction to the processor core 7021 and the first module 703 when a cold reset signal is acquired.

In a possible design, the reset control circuit 701 is specifically configured to send a reset instruction to the processor core 7021 and a first instruction to the first module 703. The first instruction instructs the first module 703 not to perform a reset operation.

It should be noted that the information interaction and execution process among the various modules/units in the reset system 700 are based on the same concept as the method embodiments in Figures 2 to 4 in this application. For details, please refer to the foregoing description of this application. The description in the method embodiment shown will not be repeated here.

An embodiment of the present application also provides a data processing system. For details, refer to FIG. 9. FIG. 9 is a system schematic diagram of a data processing system provided by an embodiment of the present application. The data processing system 900 includes a processor core 901 and a first module 902. The first module 902 includes a first register. The first register is used to store fault replacement information. The fault replacement information includes location information of the first storage unit. The unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure. The processor core 901 is used to obtain fault replacement information from the first register; the processor core 901 is also used to write fault replacement information into a non-volatile storage medium, so that the processor core 901 and the first module When 902 performs a reset operation, the fault replacement information is not lost.

In a possible design, please refer to FIG. 10, which is a system schematic diagram of a data processing system provided by an embodiment of this application. The system 900 includes a memory controller 903, and the first module 902 is integrated in the memory controller 903.

In a possible design, the processor core 901 is also used to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a warm reset operation, and during the resetting process of the first register, Backfill the failure replacement information set to the first register, where the failure replacement information set includes at least one failure replacement information.

In a possible design, the first module 902 further includes at least one second storage unit, and the second storage unit in the first module 902 is used to store the The first data in the first storage unit of the memory storage cell. The processor core 901 is also used to obtain the first data from the second storage unit in the first module 902, and write the first data into the non-volatile storage medium, so that the processor core 901 and the first When the module 902 performs a reset operation, the first data is not lost; the processor core 901 is also used to obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, During the resetting process of the first module 902, the failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module 902, wherein the failure replacement information set includes at least one failure replacement information.

In a possible design, the processor core 901 is also used to perform a reset operation on the first module 902 to initialize the first module 902; the processor core 901 is also used to perform a reset operation when the reset operation is a warm reset operation , Obtain the failure replacement information collection from the non-volatile storage medium, and perform the reverse replacement operation on the data in the storage unit of the memory according to the failure replacement information collection, wherein the failure replacement information collection includes at least one failure replacement information, and The replacement operation is used to rewrite the data in the second storage unit into the first storage unit, so that the distribution of the data in the memory in the storage unit is restored to the initial state.

In a possible design, the processor core 901 is also used to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.

It should be noted that the information interaction and execution process among the modules/units in the data processing system 900 are based on the same concept as the method embodiments in Figures 5 and 6 in this application. For details, please refer to this application. The description in the foregoing method embodiment will not be repeated here.

An embodiment of the present application also provides a computer device. Please refer to FIG. 11. FIG. 11 is a schematic diagram of a structure of the computer device provided in the implementation of this application. The reset system 700 described in the embodiment corresponding to FIG. 7 or FIG. 8 may be deployed on the computer device 110 to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4. Alternatively, the data processing system 900 described in the embodiment corresponding to FIG. 9 or FIG. 10 may be deployed on the computer device 110 to implement the functions of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6. Specifically, the computer device 110 includes: a wired or wireless network interface 1101, an input/output interface 1102, a processor 1103, and a non-volatile storage medium 1104 (the number of processors 1103 in the computer device 110 may be one or more, as shown in FIG. Take a processor as an example in 11). The processor 1103 may include an application processor 11031 and a communication processor 11032. The memory 1104 may include a non-volatile storage medium 11041 and a memory 11042. In some embodiments of the present application, the wired or wireless network interface 1101, the input/output interface 1102, the processor 1103, and the non-volatile storage medium 1104 may be connected by a bus or other means.

The memory 11042 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1103. A part of the non-volatile storage medium 11041 may also include a non-volatile random access memory (NVRAM). The non-volatile storage medium 1104 stores processors and operating instructions, executable modules or data structures, or their subsets, or their extended sets. The operating instructions may include various operating instructions for implementing various operate.

The processor 1103 controls the operation of the computer device. In specific applications, the various components of the computer equipment are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus. However, for the sake of clear description, various buses are referred to as bus systems in the figure.

The method disclosed in the foregoing embodiment of the present application may be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1103 or instructions in the form of software. The aforementioned processor 1103 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1103 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and completes the steps of the foregoing method in combination with its hardware.

The wired or wireless network interface 1101 is used to implement the signal sending and signal receiving functions of the computer device 110. The input and output interface 1102 can be used to receive input digital or character information, and output digital or character information; the input and output interface 1102 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the input and output interface 1102 also It can include display devices such as display screens.

In the embodiment of the present application, in one case, the application processor 11031 is configured to implement the function of the reset system in the embodiment corresponding to FIG. 2 to FIG. 4. It should be noted that, for the specific implementation of the function of the reset system in the embodiment corresponding to FIGS. 2 to 4 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIGS. 2 to 4 The narratives in, I will not repeat them one by one here.

In the embodiment of the present application, in another case, the application processor 11031 is configured to implement the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6. It should be noted that, for the specific implementation of the function of the data processing system in the embodiment corresponding to FIG. 5 or FIG. 6 by the application processor 11031 and the beneficial effects brought about, please refer to the respective method embodiments corresponding to FIG. 5 or FIG. 6 The narratives in, I will not repeat them one by one here.

The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a program for generating the driving speed of a vehicle. When it is driven on a computer, the computer executes the steps shown in Figs. 2 to 4 above. The steps performed by the reset system in the method described in the illustrated embodiment, or the steps performed by the data processing system in the method described in the foregoing embodiment shown in FIG. 5 or FIG. 6 are performed.

The embodiment of the present application also provides a product including a computer program, which when it runs on a computer, causes the computer to execute the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or execute The steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6 are the same.

An embodiment of the present application also provides a circuit system, the circuit system includes a processing circuit configured to perform the steps performed by the reset system in the method described in the embodiments shown in FIGS. 2 to 4, or , Execute the steps performed by the data processing system in the method described in the embodiment shown in FIG. 5 or FIG. 6.

The reset system or data processing system provided by the embodiment of the present application may specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface or a pin. Or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to make the chip execute the reset method described in the embodiment shown in FIG. 2 to FIG. 4, or the data processing method described in the embodiment shown in FIG. 5 or FIG. . Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a storage unit located outside the chip. Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

Wherein, the processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the foregoing first aspect.

In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CLUs, dedicated memories, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware, and the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the method described in each embodiment of this application .

In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims

A reset system, characterized in that the system includes a reset control circuit, a processor core, and a first module, the first module includes a first register, the first register is used to store fault replacement information, and the fault The replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;

The reset control circuit is used to obtain a hot reset signal;

The reset control circuit is further configured to send a reset instruction to a second module in response to the acquired hot reset signal, where the second module includes the processor core and does not include the first module, and the reset instruction Used to trigger the reset operation.
The system according to claim 1, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is failed replacement Backup storage unit.
The system according to claim 1, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
The system according to claim 2, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the memory in the at least one first storage unit. In the case of a cell, the data in the first storage unit that is the memory storage cell is stored.
The system according to any one of claims 1 to 4, wherein the system comprises a memory controller, and the first module is integrated in the memory controller;

The reset control circuit is specifically configured to send a reset instruction to the processor core, and not send a reset instruction to the memory controller.
The system according to any one of claims 1 to 4, characterized in that:

The reset control circuit is further configured to send a reset instruction to the processor core and the first module when a cold reset signal is acquired.
A data processing system, the system includes a processor core and a first module, the first module includes a first register, the first register is used to store fault replacement information, the fault replacement information includes a first storage Location information of the unit, the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;

The processor core is configured to obtain the fault replacement information from the first register;

The processor core is also used to write the fault replacement information into a non-volatile storage medium.
The system according to claim 7, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is failed replacement Backup storage unit.
The system according to claim 7, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
The system according to any one of claims 7 to 9, wherein the system comprises a memory controller, and the first module is integrated in the memory controller.
The system according to any one of claims 7 to 9, characterized in that:

The processor core is further configured to obtain a failure replacement information set from the non-volatile storage medium when the reset operation is a hot reset operation, and during the resetting process of the first register, reset the The failure replacement information set is backfilled to the first register, wherein the failure replacement information set includes at least one of the failure replacement information.
The system according to claim 8, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the memory in the at least one first storage unit. In the case of a cell, store the first data in the first storage unit that is the memory storage cell;

The processor core is further configured to obtain the first data from the second storage unit in the first module, and write the first data into the non-volatile storage medium, so that When the processor core and the first module perform a reset operation, the first data is not lost;

The processor core is further configured to obtain the failure replacement information set and the first data from the non-volatile storage medium when the reset operation is a hot reset operation, and perform the reset in the first module In the process, the failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module, wherein the failure replacement information set includes at least one The failure replacement information.
The system according to claim 8, wherein:

The processor core is further configured to perform a reset operation on the first module to initialize the first module;

The processor core is further configured to obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a hot reset operation, and to compare the failure replacement information set according to the failure replacement information set The data in the storage unit of the memory performs a reverse replacement operation, wherein the failure replacement information set includes at least one piece of the failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit In the first storage unit, the distribution of the data in the memory in the storage unit is restored to an initial state.
The system of claim 11, wherein:

The processor core is further configured to not obtain the failure replacement information set from the non-volatile storage medium when the reset operation is a cold reset operation.
A reset method, characterized in that the method is applied to a reset system, the system includes a reset control circuit, a processor core, and a first module, the first module includes a first register, the first register is used For storing failure replacement information, the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;

The reset control circuit obtains a hot reset signal;

The reset control circuit responds to the acquired hot reset signal and sends a reset instruction to a second module. The second module includes the processor core and does not include the first module. The reset instruction is used to trigger execution Reset operation.
The method according to claim 15, wherein the failure replacement information further includes location information of a second storage unit, and the second storage unit is a backup when the storage unit in the memory is failed replacement Storage unit.
The method according to claim 15, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
The method according to claim 16, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the at least one first storage unit as a memory In the case of a cell, the data in the first storage unit that is the memory storage cell is stored.
The method according to any one of claims 15 to 18, wherein the method comprises a memory controller, and the first module is integrated in the memory controller;

The reset control circuit sending a reset instruction to the processor core and not sending a reset instruction to the first module includes:

The reset control circuit sends a reset instruction to the processor core, and does not send a reset instruction to the memory controller.
The method according to any one of claims 15 to 18, wherein the method further comprises:

The reset control circuit sends a reset instruction to the processor core and the first module when the cold reset signal is acquired.
A data processing method, the method is applied to a data processing system, the data processing system includes a processor core and a first module, the first module includes a first register, the first register is used to store fault replacement Information, the failure replacement information includes location information of a first storage unit, and the first storage unit is a storage unit that has a failure when the storage unit in the memory is replaced with a failure;

Acquiring, by the processor core, the fault replacement information from the first register;

The processor core writes the failure replacement information into a non-volatile storage medium.
22. The method according to claim 21, wherein the fault replacement information further includes location information of a second storage unit, and the second storage unit is used when the storage unit in the memory is faulty replaced. Backup storage unit.
The method according to claim 21, wherein the granularity of the first storage unit is any one of the following: memory storage cells, memory rows, memory blocks, memory particles, memory planes, and memory bars.
The method according to any one of claims 21 to 23, wherein the system comprises a memory controller, and the first module is integrated in the memory controller.
The method according to any one of claims 21 to 23, wherein the method further comprises:

In the case that the reset operation is a hot reset operation, the processor core obtains a failure replacement information collection from the non-volatile storage medium, and during the reset process of the first module, the failure replacement information is collected Backfilling to the first register, wherein the failure replacement information includes at least one of the failure replacement information.
The method according to claim 22, wherein the first module further comprises at least one second storage unit, and the second storage unit in the first module is used to store the at least one first storage unit as a memory In the case of a cell, store the first data in the first storage unit that is the memory storage cell;

The method also includes:

The processor core obtains the first data from the second storage unit in the first module, and writes the first data into the non-volatile storage medium, so that the When the processor core and the first module perform a reset operation, the first data is not lost;

In the case that the reset operation is a warm reset operation, the processor core obtains the failure replacement information set and the first data from the non-volatile storage medium, and during the reset process of the first module, The failure replacement information set is backfilled to the first register, and the first data is backfilled to the second storage unit in the first module, wherein the failure replacement information set includes at least one of the failure replacements information.
The method according to claim 22, wherein the method further comprises:

The processor core performs a reset operation on the first module to initialize the first module;

In the case that the reset operation is a hot reset operation, the processor core obtains the failure replacement information set from the non-volatile storage medium, and according to the failure replacement information set, performs a check on the storage unit of the memory The data in the second storage unit performs a reverse replacement operation, wherein the failure replacement information set includes at least one piece of the failure replacement information, and the reverse replacement operation is used to rewrite the data in the second storage unit into the first In the storage unit, the distribution of the data in the memory in the storage unit is restored to the initial state.
The method according to claim 25, wherein the method further comprises:

In the case that the reset operation is a cold reset operation, the processor core does not obtain the failure replacement information set from the non-volatile storage medium.
A computer device, wherein the computer device is equipped with the reset system according to any one of claims 1 to 6, or the computer device is equipped with any one of claims 7 to 14 The data processing system described in one item.