CN111221775B - Processor, cache processing method and electronic equipment - Google Patents

Processor, cache processing method and electronic equipment Download PDF

Info

Publication number
CN111221775B
CN111221775B CN201811408253.7A CN201811408253A CN111221775B CN 111221775 B CN111221775 B CN 111221775B CN 201811408253 A CN201811408253 A CN 201811408253A CN 111221775 B CN111221775 B CN 111221775B
Authority
CN
China
Prior art keywords
cache
cache line
processor
line
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811408253.7A
Other languages
Chinese (zh)
Other versions
CN111221775A (en
Inventor
宋文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811408253.7A priority Critical patent/CN111221775B/en
Publication of CN111221775A publication Critical patent/CN111221775A/en
Application granted granted Critical
Publication of CN111221775B publication Critical patent/CN111221775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a processor, a cache processing method and electronic equipment, wherein the processor comprises a cache and a detection device, and the cache comprises a plurality of cache lines; the detection device is used for isolating at least one cache line when an error occurring in the at least one cache line in the plurality of cache lines meets a preset condition. According to the processor, the cache processing method and the electronic device, when errors of at least one cache line in the plurality of cache lines meet the preset conditions, the at least one cache line is isolated, the reliability of the cache is improved, and the cache line granularity is used for isolation, so that when a problem occurs in one or more cache lines, the integral use of the processor is not influenced, and the performance of the processor is effectively improved.

Description

Processor, cache processing method and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a processor, a cache processing method, and an electronic device.
Background
In modern data centers, the reliability of the server affects the stability of the service at all times, and any unpredictable downtime can cause interruption of the service, resulting in irrecoverable loss for the company. Through analysis of a large amount of downtime data in the data center in the industry, the downtime ratio caused by errors of Cache (Cache) and TLB (Translation Lookaside Buffer) in the CPU (Central Processing Unit ) is found to be more than half.
Aiming at the Cache and the TLB in the CPU, a good isolation mechanism is not provided in the industry, so that the reliability of the Cache and the TLB in the CPU is improved, and the system faults caused by the Cache and the TLB in the CPU are reduced.
Disclosure of Invention
In view of this, embodiments of the present application provide a processor, a cache processing method, and an electronic device, so as to improve reliability of the processor.
In a first aspect, embodiments of the present application provide a processor, comprising: buffering and detecting device;
the cache includes a plurality of cache lines;
the detection device is used for isolating at least one cache line when an error occurring in the at least one cache line in the plurality of cache lines meets a preset condition.
In a second aspect, an embodiment of the present application provides a cache processing method, including:
obtaining error information of at least one cache line in a plurality of cache lines;
and if the error information meets the preset condition, isolating the at least one cache line.
In a third aspect, embodiments of the present application provide an electronic device including a processor in the first aspect.
The processor, the cache processing method and the electronic device provided by the embodiment of the invention comprise a cache and a detection device, wherein the cache can comprise a plurality of cache lines, the detection device can isolate at least one cache line when an error of the at least one cache line in the plurality of cache lines meets a preset condition, the reliability of the cache is improved, and the isolation is performed with the granularity of the cache line, so that the integral use of the processor is not influenced when a problem occurs in one or a plurality of cache lines, and the performance of the processor is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a processor according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a Cache architecture of a processor according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a processor according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a Cache memory in a processor according to an embodiment of the present disclosure;
FIG. 5 is a schematic view of inspection logic of a processor according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of another processor according to an embodiment of the present disclosure;
fig. 7 is a flow chart of a cache processing method according to an embodiment of the present application;
fig. 8 is a flow chart of a data processing method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.
The processor is an operation core and a control core of the computer and mainly comprises an arithmetic logic operation unit, a cache and other components. The cache may include a plurality of cache lines, and the embodiment of the present application is mainly used for monitoring the cache lines in the processor.
Fig. 1 is a block diagram of a processor according to an embodiment of the present application. As shown in fig. 1, a processor in an embodiment of the present application may include: a buffer 11 and a detection means 12.
The cache 11 may comprise a plurality of cache lines; the detecting means 12 may be configured to isolate at least one cache line of the plurality of cache lines when an error occurring in the at least one cache line meets a preset condition.
Wherein the processor may be a central processor or any other type of processor. The Cache 11 may refer to a Cache or any other type of Cache, e.g. a Cache or TLB, etc. For convenience of description, the embodiment of the present application will be described in detail by taking the Cache 11 as a Cache and taking a Cache Line (Cache Line) as an example.
Specifically, the Cache may include a plurality of Cache lines, and the detection device 12 may isolate a Cache Line when an error occurred in any Cache Line meets a preset condition.
FIG. 2 is a schematic diagram of a Cache architecture of a processor according to an embodiment of the present application. As shown in fig. 2, the processor may include one or more CPU Die, each of which may include one or more CPU cores.
Each CPU Core may be configured with a Cache, which is a memory located between the CPU execution unit and DRAM (Dynamic Random Access Memory, main memory), which is small in size but very high in speed, typically consisting of SRAM (Static Random Access Memory, static memory). The speed of CPU is far higher than that of memory, when CPU directly accesses data from memory, it needs to wait for a certain period of time, and the Cache can store a part of data which has just been used or circularly used by CPU, if CPU needs to use the part of data again, it can be directly called from Cache, so that it can avoid repeatedly accessing data, and reduce waiting time of CPU, so that it can raise system efficiency.
The Cache may be of various types. Optionally, the types of the Cache may include:
the first Level instruction Cache (Level-1 Instruction Cache,L1I Cache, also called L1 instruction Cache) is used for caching CPU instructions, belongs to read-only Cache, and generally each CPU Core is provided with a group of L1I caches;
the first-Level data Cache (Level-1 Data Cache,L1D Cache, also called L1 data Cache) is used for caching CPU access data, the speed is fastest, the capacity is smallest, and generally each CPU Core is provided with a group of L1D caches;
The second Level Cache (Level-2 Cache, L2 Cache) is used for caching CPU access data, is positioned between the first Level Cache and the last Level Cache, has moderate speed and capacity, and generally has a group for each CPU Core;
the Last Level Cache (LLC) is used for caching access data of a CPU or IO device, the speed is slower than that of L1 and L2, but the capacity is maximum, and generally each CPU Die is provided with a group, and different CPU cores can share the same.
The first-level instruction Cache, the first-level data Cache and the second-level Cache can be arranged in the CPU Core, and the last-level Cache can be arranged outside the CPU Core.
In this embodiment of the present application, one or more types of caches may be disposed in the CPU, and different CPU cores may be configured with the same type of Cache, or may be configured with different types of caches. For example, a first-level instruction Cache, a first-level data Cache and a second-level Cache can be arranged in each CPU Core, and a last-level Cache can be arranged outside the CPU cores; or, one CPU Core may be provided with only the first-level instruction Cache and the first-level data Cache, and the other CPU Core may be provided with the first-level instruction Cache, the first-level data Cache and the second-level Cache.
Alternatively, the number of CPU cores in the processor may be plural, and each CPU Core may be correspondingly configured with at least one type of Cache and a detecting device 12. The Cache may include a plurality of Cache lines.
Fig. 3 is a schematic structural diagram of a processor according to an embodiment of the present application. As shown in fig. 3, the CPU Die may include a plurality of CPU cores, each of which may be configured with a corresponding detection device 12 (for convenience of description, fig. 3 shows only the Cache provided inside the CPU Core, and does not show the Cache provided outside the CPU Core) in addition to the Cache.
The configuration of the CPU Core with the Cache in the embodiment of the present application means that the CPU Core is internally provided with the Cache (generally at least one of the first level instruction Cache, the first level data Cache, and the second level Cache), and/or that the CPU Core is externally provided with the Cache (generally the last level Cache) connected to the CPU Core. The configuration of the detection device 12 by the CPU Core in the embodiment of the present application means that the detection device 12 connected to the Cache corresponding to the CPU is disposed in or outside the CPU Core, where the detection device 12 is configured to isolate the Cache Line when an error occurring in the Cache Line corresponding to the CPU meets a preset condition.
The preset conditions can be set according to actual needs. Alternatively, the detecting device 12 may be specifically configured to: if the number of times that a Correctable Error (CE) occurs in the Cache Line is greater than a corresponding threshold, or if an uncorrectable Error (Uncorrected Error, UCE) occurs in the Cache Line, isolating the Cache Line.
Among other things, a correctable error may refer to a hardware error that has been corrected that occurs in the system, such as a 1-bit ECC error that occurs in the Cache. Uncorrectable errors refer to uncorrectable hardware errors occurring in a system, which can be classified into two types, one type is that the system context is destroyed, and the system can only be crashed, and the other type is that the system context is not destroyed although the errors cannot be corrected, and can be remedied by some isolation measures. If the detection device 12 detects that an uncorrectable error occurs in the Cache Line, or the occurred correctable error is accumulated to a certain extent, the Cache Line can be isolated.
The detection device 12 can be configured by those skilled in the art according to actual needs, as long as the above functions provided by the embodiments of the present application can be implemented. For example, determining whether a correctable error or an uncorrectable error occurs in a Cache Line may be implemented by using an existing Cache detection method, determining whether the number of times of occurrence of the correctable error is greater than a corresponding threshold value may be implemented by using a gate circuit or the like, and isolating the Cache Line may be implemented by setting a corresponding position 0 or 1 of the Cache Line or the like.
Optionally, a corresponding storage space may be set in the Cache Line, for storing information such as the number of times of occurrence of the correctable error and the isolation state.
FIG. 4 is a schematic diagram of a memory space of a Cache in a processor according to an embodiment of the present application. As shown in fig. 4, in the prior art, in addition to the data field, a Tag field and a Status field exist in a Cache Line, where the Tag field is used to store a physical address, and the physical address may be used for intra-group addressing. The Status field contains Valid bit, lock bit, dirty bit and other Status information, wherein the Valid bit is used for indicating whether the Cache Line is Valid, the Lock bit is used for indicating whether the Cache Line is locked, whether write operation is allowed, and the Dirty bit is used for indicating whether the data in the Cache Line is consistent with the main memory.
In this embodiment of the present application, a portion of storage space may be added to a Cache Line, specifically, an ID field (identity field for storing ID information), a CE field (correctable error field for storing the number of correctable errors occurring in the Cache Line), and an Online bit (Online bit for identifying whether the Cache Line is isolated, i.e. available) may be added to a Status field.
Accordingly, the detection device 12 may be specifically configured to: if the error which can be corrected is detected to occur in the Cache Line, adding 1 to the value of the CE domain of the Cache Line, judging whether the value of the CE domain is larger than a corresponding threshold value, and if so, updating the Online bit of the Cache Line.
Alternatively, the Online bit may be set to 0, which indicates that the Cache Line is isolated, and the Online bit is 1, which indicates that the Cache Line is not isolated. After each initialization, the Online bit is set to 1, and when the value of the CE domain is detected to be greater than the corresponding threshold, the Online bit may be set to 0.
Or when the value of the CE domain is detected to be larger than the corresponding threshold value, whether the Lock bit is 0 can be judged first, and if the Lock bit is 0, the Online bit is set to 0; if the Lock bit is 1, it indicates that the Cache Line is being used, the Online bit may not be updated temporarily, and when the Lock bit is restored to 0, the Online bit is set to 0.
When an execution unit or IO device of the CPU accesses the Cache Line, an Online bit of the Cache Line can be read first, and if the Online bit is 1, the instruction is not isolated and normal access can be performed; if the Online bit is 0, this indicates that the Cache Line is isolated and cannot be accessed.
Of course, the Online bit may be set to 1 to indicate that the Cache Line is isolated, and the Online bit is set to 0 to indicate that the Cache Line is not isolated, so long as it is satisfied that the Online bit can characterize whether the Cache Line is isolated.
The embodiment of the application provides a fault isolation mechanism taking a Cache Line as a unit, namely after finding that an error occurs in a certain Cache Line, only the Cache Line is required to be isolated, and the CPU Core or the CPU Die where the Cache Line is positioned is not required to be isolated, so that the integral use of the CPU Core or the CPU Die is not influenced, the influence of the isolation mechanism on a system is reduced, and the isolation mechanism can be greatly used in a data center.
The detection device 12 can be realized by hardware, so that fault isolation related to the Cache is migrated from a software level to a hardware level, and the reliability and the success probability of the isolation are improved.
RAS (Reliability, availability and serviceability) is an important feature of modern devices. The Reliability refers to that the system must be as reliable as possible, and not breakdown or even cause physical damage of the system, which means that a system with Reliability must be able to perform self-repair for some small errors, and isolate errors which cannot be self-repaired as much as possible, so as to ensure that the rest of the system operates normally. Availabilities mean that the system must be able to ensure as long as possible without going offline, and even if some small problems occur in the system, the system will not affect the normal operation of the whole system, and in some cases, the Hot Plug operation may even be performed, replacing the problematic components, so that the down time of the system is strictly ensured to be within a certain range. Serviceability means that the system can provide convenient diagnostic functions, such as system logs, dynamic detection and other means, so that management personnel can conveniently perform system diagnosis and maintenance operations, and accordingly errors can be found and repaired early.
The purpose of the RAS is to ensure that the whole system operates reliably as long as possible without going offline and with a sufficiently powerful fault tolerant mechanism. This is an integral part of the application environment like large data centers, web centers like stock exchanges, telecommunication rooms, database centers of banks etc. The technical scheme provided by the embodiment of the application can effectively improve the RAS of the system and ensure the normal operation of the system.
Optionally, corresponding thresholds may be set for different types of caches, and different Cache types may correspond to different thresholds. For example, the threshold value corresponding to the L1 instruction Cache is 3, the threshold value corresponding to the L1 data Cache is 5, when the number of correctable errors occurring in a certain Cache Line in the L1 instruction Cache exceeds 3, the Cache Line can be isolated, and for the L1 data Cache, when the number of correctable errors occurring in the Cache Line exceeds 5, the Cache Line can be isolated, so that different types of caches are processed, and the use requirements of different types of caches are met.
In summary, the processor provided in this embodiment includes a Cache and a detection device 12, where the Cache may include a plurality of Cache lines, and the detection device 12 may isolate at least one Cache Line when an error occurred in the at least one Cache Line in the plurality of Cache lines meets a preset condition, thereby improving reliability of the Cache; and the Cache lines are used as granularity for isolation, so that when a problem occurs in one or more Cache lines, the integral use of the processor is not influenced, and the performance of the processor is effectively improved.
When the Cache is accessed, the Cache is written or read with data, and the detection device 12 can determine whether the Cache Line is in error or not and isolate the Cache Line when the error meets a certain condition. Further, when the Cache is not accessed, the detection device 12 may automatically patrol the Cache, and isolate the Cache Line meeting the condition according to the patrol result (whether a correctable error or an uncorrectable error occurs).
Alternatively, when the detection device 12 finds that the Cache access is idle, a patrol task may be initiated, where the patrol task uses the ID information of the Cache Line as an identifier and uses the Cache Line as granularity to detect one by one. The priority of the patrol task is lowest, once the CPU or the IO equipment is found to initiate the Cache access, the patrol task can be exited, and the next patrol task can be started from the exited Cache Line.
Specifically, the detection device 12 may read an ID field of the Cache Line, where the ID field stores ID information of the Cache Line, where the ID information is unique inside the Cache. The detection device 12 may perform inspection on the Cache Line in the Cache when the Cache is not accessed, so as to actively find a problem existing in the Cache Line.
The inspection refers to an automatic read-write detection task for the Cache, which is initiated by the detection device 12, in this embodiment, when the normal read-write performance of the Cache is not affected, data is written into each byte in the Cache Line data field according to the ID information of the Cache Line, and verification is performed, so that errors and damages of the Cache hardware are actively found.
Optionally, the inspection process may include: writing data in the Cache Line according to the ID information of the Cache Line, and checking the written data. If the number of times of correctable errors occurring in the Cache Line is larger than a corresponding threshold value, or uncorrectable errors occur in the Cache Line, isolating the Cache Line.
Thus, when the Cache is not busy, the detection device 12 can patrol the Cache through the ID information, in the patrol process, if an uncorrectable error occurs in the Cache Line, the Online bit is set to 0, if a correctable error occurs in the Cache Line, the count value of the CE domain is increased, and when the count value of the CE domain exceeds a threshold, the Online bit is set to 0, thereby isolating the Cache Line.
Fig. 5 is a schematic view of inspection logic of a processor according to an embodiment of the present application. As shown in fig. 5, the detection device 12 may read ID information of a Cache Line, write data in a data field of the Cache Line according to the ID information, and perform verification, and when detecting that a correctable error occurs in the Cache Line, add 1 to a CE field value of the Cache Line, and determine whether the CE field value is greater than a corresponding threshold, and if so, update an Online bit of the Cache Line.
Optionally, detection of Lock bits may also be introduced. Similar to the previous embodiment, if the value of the CE field is greater than the corresponding threshold and the Lock bit is 0, the patrol engine will set the one bit of the Cache Line to 0, which indicates that the Cache Line is already isolated, and the subsequent CPU and IO devices will not be able to access the Cache Line; if Lock is 1, isolation is not possible. If the value of the CE domain is not greater than the corresponding threshold, no isolation action is performed.
Optionally, after the inspection of all the idle caches is finished, the inspection can be performed again from the beginning immediately, or after a certain time, the inspection can be performed again. The frequency of the inspection may be set according to actual needs, for example, inspection is performed once every 10 minutes, or the frequency may be determined by the user.
To improve reliability, ECC (Error Correcting Code, checksum correction functionality) may also be provided for newly introduced ID and CE domains. ECC is a technology capable of realizing error checking and correction, and enables the whole equipment to be more safe and stable during operation.
Specifically, an ECC technology can be adopted to check the ID domain and the CE domain of the Cache Line, if the ID domain or the CE domain is found to be wrong and cannot be corrected, the one bit of the Cache Line is directly set to 0, so that the Cache Line is isolated.
In summary, the embodiment of the application introduces a Cache automatic inspection mechanism, under the condition that the use of the Cache is not affected, the detection device 12 automatically inspects all Cache lines, actively discovers errors, actively triggers an isolation mechanism, isolates the Cache lines with problems, realizes the automatic detection and the automatic isolation taking the Cache lines as granularity, reduces the probability of triggering the Cache errors in the actual scene by the later CPU, greatly improves the reliability of the Cache, reduces the probability of downtime of the system caused by the Cache errors, and improves the reliability of the whole system.
In the technical solution provided in the embodiment of the present application, the processor may further include an MCA (Machine-check Architecture, hardware detection mechanism) device, so that monitoring of the Cache is implemented by the MCA device.
Fig. 6 is a schematic structural diagram of another processor according to an embodiment of the present application. As shown in fig. 6, the processor may include a Cache, a detection device 12 and an MCA device 13, where the structure, function and connection relationship of the Cache and the detection device 12 are similar to those of the foregoing embodiments, and are not repeated herein.
The detection device 12 may be further connected to the MCA device 13, where after the Cache Line is isolated, the detection device 12 may send the isolation information of the Cache Line to the MCA device 13, and the MCA device 13 is configured to monitor the Cache according to the isolation information of the Cache Line.
The MCA is a set of mechanism for processing hardware errors in the Intel x86 architecture, and can monitor the state of the Cache and provide an indication of whether the Cache is healthy. The Cache is in a green state under normal conditions, and enters a yellow state when certain conditions are met. If a yellow state is detected, a Cache-threshold-trigger (Cache threshold trigger) is triggered, and the system can perform different processing according to whether the error is L1Cache, L2Cache or LLC.
The CPU/system/platform response priority to the "yellow" state should be less than the response to the uncorrected error. An uncorrected error means that a serious error has actually occurred, while the "yellow" state is a warning, and although the Cache has met certain conditions, it may not itself be a serious event, and the error has been corrected, and the system state is not affected.
In the embodiment of the present application, the hardware for implementing the MCA is referred to as the MCA device 13, that is, the MCA device 13 may be any hardware device capable of implementing the MCA function, for example, the MCA function may be implemented by a register or the like.
Alternatively, the MCA device 13 may be specifically configured to: if the number of the isolated Cache lines is larger than the corresponding limit value, reporting the operating system.
Specifically, when the CPU accesses the L1 instruction Cache, the L1 data Cache, the L2 Cache, the Last-Level Cache, or the IO device accesses the Last-Level Cache through the DDIO (Data Direct IO) technology, if a correctable error occurs in the Cache Line, the detecting device 12 adds 1 to the CE field in the Cache Line, and compares the CE field with a corresponding threshold value: if the value of the CE field is greater than or equal to the value and the Lock bit is 0, the detection device 12 sets the Online bit of the Cache Line to 0, which indicates that the Cache Line is already isolated, and the subsequent CPU and IO devices cannot access the Cache Line. If Lock is 1, isolation is not possible. If the value of the CE domain is less than the corresponding threshold, no isolation action is performed.
Meanwhile, the detection device 12 sends the isolation information of the Cache Line to the MCA device 13, where the isolation information may include ID information of the Cache Line, a type and a position of occurrence of error information, isolation result information, and the like, and the isolation information may be reported to the operating system through CMCI interrupt of the MCA.
After receiving the isolation information of the Cache Line, the MCA records the number of lines (the number of the isolated Cache Line in the Cache) of the isolated Cache, if the number of lines exceeds a corresponding limit value, the MCA reports the yellow state of the Cache to the operating system, otherwise, the yellow state of the Cache is the green state.
After detecting the yellow state of the Cache, the operating system can record the yellow state in a log and call the Cache-threshold-trigger to isolate the CPU Core or the CPU Die where the Cache is located. Only the log is recorded if it is in the "green" state. The specific processing method of the operating system when the Cache is in the "yellow" state or the "green" state belongs to the prior art, and is not described in detail in this embodiment.
Similarly, in the process of automatic inspection of the Cache by the detection device 12, if the Cache Line is isolated, the detection device 12 may process the Cache Line according to the above method, send the isolated information of the Cache Line to the MCA device 13, and after receiving the isolated information of the Cache Line, the MCA records the number of lines that the Cache has been isolated, and if the number of lines exceeds the corresponding limit value, reports the Cache to the operating system as a "yellow" state, otherwise as a "green" state.
Alternatively, different limit values may be configured for different CPU cores or different Cache types, and a determination may be made for each CPU Core or each Cache type, respectively. When the number of isolated Cache lines in a certain type of Cache of a certain CPU Core is larger than a corresponding limit value, marking the Cache as a yellow state, reporting the yellow state to an operating system, and carrying out further processing by the operating system.
In the embodiment of the application, the threshold value and the corresponding limit value of the correctable error of the Cache corresponding to the error occurrence can be stored through MSR (Model Specific Registers, model specific register). The MSR may be used in particular, considering that different CPU cores and different types of caches may correspond to different thresholds and limit values: the threshold and the limit value corresponding to the L1 data Cache, the L1 instruction Cache and the L2 Cache of each CPU Core are stored, and the threshold and the limit value corresponding to the LLC of each CPU Die are stored.
Alternatively, for each CPU Core, an MSR may be introduced to store the following data:
MSR_L1I_CE_THRESHOLD: the L1 instruction Cache generates a threshold value of correctable errors, and when the threshold value is exceeded, the detection device 12 sets the Online bit of the Cache Line to 0;
MSR_L1D_CE_THRESHOLD: when the threshold of the correctable error occurs in the L1 data Cache and exceeds the threshold, the detection device 12 sets the Online bit of the Cache Line to 0;
MSR_L2C_CE_THRESHOLD: when the threshold of the correctable error occurs in the L2 Cache, the detection device 12 sets the Online bit of the Cache Line to 0;
MSR_L1I_ISOL_LTD: when the limit value of the Line number of the isolated Cache of the L1 instruction exceeds the limit value, the MCA device 13 reports that the Cache is in a yellow state;
MSR_L1D_ISOL_LTD: when the value of the Line quantity limit value of the isolated Cache of the L1 data exceeds the value, the MCA device 13 reports that the Cache is in a yellow state;
MSR_L2C_ISOL_LTD: when the value of the Line limit value of the isolated Cache of the L2Cache is exceeded, the MCA device 13 reports that the Cache is in a yellow state.
For each CPU Die, an MSR may be introduced to store the following data:
MSR_LLC_CE_THRESHOLD: when the threshold of the error correctable by the Last-Level Cache is exceeded, the detection device 12 sets the one bit of the Cache Line to 0;
MSR_LLC_ISOL_LTD: when the limit value of the Line number of the isolated Cache of the Last-Level Cache exceeds the limit value, the MCA device 13 reports that the Cache is in a yellow state.
Further, after the MCA device 13 reports the state of the Cache to the operating system, the operating system may isolate the CPU Core or the entire CPU Die using the Cache-threshold-trigger. Alternatively, if the number of isolated LLC lines is greater than the corresponding limit value, the corresponding CPU Die may be isolated; if the isolated line number of the non-LLC Cache is larger than the corresponding limit value, the corresponding CPU Core can be isolated, and the specific implementation method of isolating the CPU Die or the CPU Core can refer to the prior art.
In summary, in this application embodiment, detection device 12 is when the Cache Line is kept apart, can send isolation information to MCA device 13, realize the control to the Cache through MCA device 13, further improved the system reliability, and the Cache Line isolation scheme that this application embodiment provided can combine with Intel current MCA framework seamless, guaranteed the compatibility in the realization process, in addition, different types of Cache Line can set up different thresholds and extreme value, satisfied the user demand of different types of Cache, guarantee the normal operating of system.
The above description has been made with reference to the Cache as an example, and the technical ideas of the embodiments of the present application are also applicable to other caches in the processor, such as TLB, for example, MMU (Memory Management Unit ) TLB, IOMMU (Input/Output Memory Management Unit, input/output memory management unit) TLB, SMMU (system memory management unit ) TLB, and the like. The cache Line to which the TLB corresponds may be a TLB Line.
When the scheme is applied to other caches, the Cache Line can be replaced by other Cache lines instead of the Cache Line by other caches.
For example, in the foregoing scheme, the TLB is used instead of the Cache, and the TLB Line is used instead of the Cache Line, so that a scheme for monitoring the TLB can be obtained.
The TLB may also be divided into a plurality of types, for example, the types of the TLB may include a first-level TLB, a second-level TLB, and the like. Similar to Cache, corresponding thresholds and limit values may be set for different TLB types, respectively.
Optionally, a Cache and a TLB may be provided in the processor, and both may be connected to the detection device 12, where the detection device 12 may monitor the Cache and the TLB at the same time.
The implementation procedure of the processing method provided in the embodiment of the present application is described below with reference to the following method embodiments and the accompanying drawings. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
Fig. 7 is a flowchart of a cache processing method according to an embodiment of the present application. As shown in fig. 7, the cache processing method in this embodiment may include:
step 701, obtaining error information of at least one cache line in a plurality of cache lines.
Step 702, if the error information meets a preset condition, isolating the at least one cache line.
The execution body of the cache processing method provided in this embodiment may be a processor as described in any of the foregoing embodiments, and the specific implementation principle and process of the method may refer to the foregoing embodiments, which are not described herein again.
Optionally, the error information includes the number of times that a correctable error occurs and/or the number of times that an uncorrectable error occurs.
Optionally, if the error information meets a preset condition, isolating the at least one cache line may include: if the number of times that any cache line generates a correctable error is greater than a corresponding threshold, or if the cache line generates an uncorrectable error, isolating the cache line.
Optionally, obtaining the error information of the cache line may include: if the fact that the cache line has a correctable error is detected, adding 1 to a value of a CE domain, and reading the value of the CE domain; the CE domain is used for storing the times of occurrence of correctable errors of the cache lines.
Optionally, if the number of times that a correctable error occurs in any cache line is greater than a corresponding threshold, isolating the cache line may include: judging whether the value of the CE domain is larger than a corresponding threshold value; if the bit is larger than the first bit, updating the Online bit of the cache line; wherein the Online bit is used to indicate whether a cache line is isolated.
Optionally, the method may further include: when the cache line is not accessed, data is written in the cache line and verified.
Optionally, the method may further include: and if the number of the isolated cache lines in any cache is greater than the corresponding limit value, isolating the cache.
In summary, according to the cache processing method provided by the embodiment, error information of at least one cache line in a plurality of cache lines can be obtained, if the error information meets a preset condition, the at least one cache line is isolated, and the reliability of the cache can be effectively improved; and the buffer memory line granularity is used for isolation, so that when a problem occurs in one or more buffer memory lines, the integral use of the processor is not influenced, and the performance of the processor is effectively improved.
Fig. 8 is a flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 8, the data processing method in this embodiment may include:
step 801, obtaining a threshold value corresponding to a cache input by a user;
step 802, storing a threshold corresponding to the cache, so that the cache line in the cache is isolated when the number of times of occurrence of the correctable errors is greater than the threshold.
The data processing method provided by the embodiment of the application can be implemented based on the processor provided by each embodiment. For a part of the present embodiment which cannot be described in detail, reference may be made to the above-described embodiment.
Optionally, obtaining the threshold corresponding to the cache input by the user may include: displaying the type of cache in the processor; and acquiring threshold values corresponding to the caches of the various types input by the user.
Optionally, the data processing method may further include: acquiring a limit value corresponding to the cache input by a user; storing the limit value corresponding to the cache; if the number of isolated cache lines is greater than the corresponding limit value, the corresponding CPU Core or CPU Die is isolated.
Specifically, the threshold value corresponding to the cache may be stored in the MSR, and the threshold values and the limit values corresponding to different CPU cores and different types of caches may be the same or different. Alternatively, the threshold and limit values corresponding to the cache may be stored in a manner as provided in the above embodiments.
According to the data processing method provided by the embodiment of the invention, the user can be allowed to set the threshold corresponding to the cache, so that the cache line is isolated when the number of errors is larger than the threshold, the user can set the cache according to the actual use condition of the equipment, the cache monitoring requirements in different scenes can be effectively met, and the reliability of the system is improved.
A data processing apparatus of one or more embodiments of the present application will be described in detail below. Those skilled in the art will appreciate that these data processing devices may be configured using commercially available hardware components through the steps taught by the present solution.
Fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9, the apparatus may include:
an obtaining module 901, configured to obtain a threshold value corresponding to a cache line input by a user;
and the storage module 902 is configured to store a threshold value corresponding to the cache line, so that the cache line is isolated when the number of times of errors occurring is greater than the threshold value.
Optionally, the acquiring module 901 may specifically be configured to: displaying the type of cache in the processor; and acquiring threshold values corresponding to the caches of the various types input by the user.
Optionally, the data processing device may be further configured to: acquiring a limit value corresponding to the cache input by a user; storing the limit value corresponding to the cache; if the number of isolated cache lines is greater than the corresponding limit value, the corresponding CPU Core or CPU Die is isolated.
The apparatus shown in fig. 9 may perform the data processing method described above, and reference is made to the description of the foregoing embodiment for a portion not described in detail in this embodiment. The implementation process and technical effects of this technical solution are referred to the description in the foregoing embodiments, and are not repeated here.
The embodiment of the application also provides electronic equipment, which comprises the processor of any one of the embodiments. The electronic device may be a computer or the like or any other type of electronic device such as a tablet device, smart phone or the like.
Optionally, the electronic device may further include: a memory; the memory is configured to store one or more computer instructions that, when executed by the processor, implement the data processing method of any of the above.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 is for storing a program for supporting the electronic device to execute the data processing method provided in any of the foregoing embodiments, and the processor 21 is configured for executing the program stored in the memory 22.
The program comprises one or more computer instructions which, when executed by the processor 21, are capable of carrying out the steps of:
acquiring a threshold value corresponding to the cache input by a user;
and storing a threshold value corresponding to the cache, so that the cache line in the cache is isolated when the number of times of correctable errors is greater than the threshold value.
Optionally, the processor 21 is further configured to perform all or part of the steps in the embodiment shown in fig. 8.
The structure of the electronic device may further include a communication interface 23, which is used for the electronic device to communicate with other devices or a communication network.
Additionally, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, cause the processor to perform actions comprising:
acquiring a threshold value corresponding to the cache input by a user;
and storing a threshold value corresponding to the cache, so that the cache line in the cache is isolated when the number of times of correctable errors is greater than the threshold value.
The computer instructions, when executed by a processor, may also cause the processor to perform all or part of the steps involved in the data processing method described in the foregoing embodiments.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (19)

1. A processor, comprising: buffering and detecting device;
the cache includes a plurality of cache lines;
the detection device is used for isolating at least one cache line when an error occurring in the at least one cache line in the plurality of cache lines meets a preset condition;
the detection device is specifically used for: if the number of times that any cache line generates a correctable error is greater than a corresponding threshold, or if the cache line generates an uncorrectable error, isolating the cache line.
2. The processor of claim 1, wherein the cache line comprises: error CE domain and Online bit can be corrected;
the CE domain is used for storing the times of occurrence of correctable errors of the cache lines;
the Online bit is used to indicate whether a cache line is isolated.
3. The processor according to claim 2, wherein the detection means is specifically configured to: if the correctable errors of the cache line are detected, adding 1 to the value of the CE domain, judging whether the value of the CE domain is larger than a corresponding threshold value, and if so, updating the Online bit of the cache line.
4. The processor of claim 2, wherein the cache line further comprises: an identity ID field;
The ID field is used to store ID information of the cache line.
5. The processor of claim 4, wherein the detecting means is further for:
when the cache line is not accessed, data is written into the cache line through the ID information of the cache line, and the data is checked.
6. A processor according to claim 3, further comprising: a hardware detection mechanism MCA device;
the detection device is also used for sending the isolation information of the cache line to the MCA device after the cache line is isolated;
the MCA device is used for monitoring the cache according to the isolation information of the cache line.
7. The processor of claim 6, wherein the MCA device is specifically configured to: if the number of the isolated cache lines is greater than the corresponding limit value, reporting the operating system.
8. The processor of any one of claims 1-7, wherein the Cache comprises a Cache and/or a translation detection buffer, TLB;
the Cache Line or translation detection buffer Line TLB Line.
9. The processor of claim 8, wherein the processor comprises one or more central processor Die CPU Die comprising one or more central processor Core CPU cores;
The cache is of multiple types; each CPU Core is correspondingly provided with at least one type of said cache and one of said detecting means.
10. The processor of claim 9, wherein the types of caches include a first level instruction Cache, a first level data Cache, a second level Cache, and a last level Cache;
the first-stage instruction Cache, the first-stage data Cache and the second-stage Cache are arranged in the CPU Core, and the last-stage Cache is arranged outside the CPU Core.
11. The processor of claim 9, wherein the types of TLBs include a first level TLB, a second level TLB.
12. The processor of claim 8, further comprising: model specific registers MSR;
the MSR is used for storing a threshold value corresponding to the cache line and used for generating a correctable error and a limit value corresponding to the cache.
13. The processor of claim 12, wherein the MSR is specifically configured to: storing threshold values and limit values corresponding to a first-stage instruction Cache, a first-stage data Cache and a second-stage Cache of each CPU Core; and storing the threshold value and the limit value corresponding to the last-stage Cache of each CPU Die.
14. The cache processing method is characterized by comprising the following steps:
Obtaining error information of at least one cache line in a plurality of cache lines;
if the error information meets a preset condition, isolating the at least one cache line;
the error information comprises the number of times of occurrence of correctable errors and/or the number of times of occurrence of uncorrectable errors;
if the error information meets a preset condition, isolating the at least one cache line includes: if the number of times that any cache line generates a correctable error is greater than a corresponding threshold, or if the cache line generates an uncorrectable error, isolating the cache line.
15. The method of claim 14, wherein obtaining error information for a cache line comprises:
if the fact that the cache line has a correctable error is detected, adding 1 to the value of the CE domain;
reading the value of the CE domain;
the CE domain is used for storing the times of occurrence of correctable errors of the cache lines.
16. The method of claim 15, wherein isolating any cache line if the number of correctable errors occurring in the cache line is greater than a corresponding threshold comprises:
judging whether the value of the CE domain is larger than a corresponding threshold value;
if the bit is larger than the first bit, updating the Online bit of the cache line;
Wherein the Online bit is used to indicate whether a cache line is isolated.
17. The method according to any one of claims 14 to 16, further comprising:
when the cache line is not accessed, data is written in the cache line and verified.
18. The method according to any one of claims 14 to 17, further comprising:
and if the number of the isolated cache lines in any cache is greater than the corresponding limit value, isolating the cache.
19. An electronic device, comprising: the processor of any one of claims 1-13.
CN201811408253.7A 2018-11-23 2018-11-23 Processor, cache processing method and electronic equipment Active CN111221775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811408253.7A CN111221775B (en) 2018-11-23 2018-11-23 Processor, cache processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811408253.7A CN111221775B (en) 2018-11-23 2018-11-23 Processor, cache processing method and electronic equipment

Publications (2)

Publication Number Publication Date
CN111221775A CN111221775A (en) 2020-06-02
CN111221775B true CN111221775B (en) 2023-06-20

Family

ID=70828639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811408253.7A Active CN111221775B (en) 2018-11-23 2018-11-23 Processor, cache processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN111221775B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010338B (en) * 2021-02-19 2022-11-15 山东英信计算机技术有限公司 Error leakage threshold value adjusting method, device, equipment and medium of memory CE
CN116204455B (en) * 2023-04-28 2023-09-22 阿里巴巴达摩院(杭州)科技有限公司 Cache management system, method, private network cache management system and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105242978A (en) * 2015-11-19 2016-01-13 东软集团股份有限公司 Method and device for processing failure of cache lines of CPU under multi-thread condition
CN105677581A (en) * 2016-01-05 2016-06-15 上海斐讯数据通信技术有限公司 Internal storage access device and method
CN107423234A (en) * 2016-04-18 2017-12-01 联发科技股份有限公司 Multicomputer system and caching sharing method
CN108182125A (en) * 2017-12-27 2018-06-19 武汉理工大学 The detection of cache multidigit hard error and fault tolerance facility and method under nearly threshold voltage

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950978B2 (en) * 2001-03-29 2005-09-27 International Business Machines Corporation Method and apparatus for parity error recovery
US8095831B2 (en) * 2008-11-18 2012-01-10 Freescale Semiconductor, Inc. Programmable error actions for a cache in a data processing system
US8392929B2 (en) * 2009-12-15 2013-03-05 Microsoft Corporation Leveraging memory isolation hardware technology to efficiently detect race conditions
GB2528901B (en) * 2014-08-04 2017-02-08 Ibm Uncorrectable memory errors in pipelined CPUs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105242978A (en) * 2015-11-19 2016-01-13 东软集团股份有限公司 Method and device for processing failure of cache lines of CPU under multi-thread condition
CN105677581A (en) * 2016-01-05 2016-06-15 上海斐讯数据通信技术有限公司 Internal storage access device and method
CN107423234A (en) * 2016-04-18 2017-12-01 联发科技股份有限公司 Multicomputer system and caching sharing method
CN108182125A (en) * 2017-12-27 2018-06-19 武汉理工大学 The detection of cache multidigit hard error and fault tolerance facility and method under nearly threshold voltage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘翔等."动态内存分配器研究综述".《计算机学报》.2018,第41卷(第10期),第2359-2379页. *

Also Published As

Publication number Publication date
CN111221775A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
US7971112B2 (en) Memory diagnosis method
US11119874B2 (en) Memory fault detection
US9065481B2 (en) Bad wordline/array detection in memory
US20080163014A1 (en) Tracking health of integrated circuit structures
US8621336B2 (en) Error correction in a set associative storage device
US20150039968A1 (en) Error code management in systems permitting partial writes
US20230185659A1 (en) Memory Fault Handling Method and Apparatus
US9645904B2 (en) Dynamic cache row fail accumulation due to catastrophic failure
CN104685474A (en) Notification of address range including non-correctable error
CN111221775B (en) Processor, cache processing method and electronic equipment
US20130246868A1 (en) Arithmetic processing apparatus and method of controlling arithmetic processing apparatus
US8775863B2 (en) Cache locking control
WO2019000206A1 (en) Methods and apparatus to perform error detection and/or correction in a memory device
US8874957B2 (en) Dynamic cache correction mechanism to allow constant access to addressable index
US9984766B1 (en) Memory protection circuitry testing and memory scrubbing using memory built-in self-test
CN114996065A (en) Memory fault prediction method, device and equipment
CN114830241A (en) Failure repair method and device for memory
CN114860487A (en) Memory fault identification method and memory fault isolation method
JP2016513309A (en) Control of error propagation due to faults in computing nodes of distributed computing systems
US20110320866A1 (en) Dynamic pipeline cache error correction
CN106874161B (en) Method and device for processing cache exception
US9921906B2 (en) Performing a repair operation in arrays
CN117033038A (en) CPU fault prediction method, model training method, device and storage medium
CN115705261A (en) Memory fault repairing method, CPU, OS, BIOS and server
CN115756911A (en) Memory fault processing method, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant