CN112948160B - Method and device for positioning and repairing memory ECC problem - Google Patents

Method and device for positioning and repairing memory ECC problem Download PDF

Info

Publication number
CN112948160B
CN112948160B CN202110219990.8A CN202110219990A CN112948160B CN 112948160 B CN112948160 B CN 112948160B CN 202110219990 A CN202110219990 A CN 202110219990A CN 112948160 B CN112948160 B CN 112948160B
Authority
CN
China
Prior art keywords
ecc
memory
server
test
bios
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110219990.8A
Other languages
Chinese (zh)
Other versions
CN112948160A (en
Inventor
许雪雪
姜庆臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingxin Computer Technology Co Ltd
Original Assignee
Shandong Yingxin Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yingxin Computer Technology Co Ltd filed Critical Shandong Yingxin Computer Technology Co Ltd
Priority to CN202110219990.8A priority Critical patent/CN112948160B/en
Publication of CN112948160A publication Critical patent/CN112948160A/en
Application granted granted Critical
Publication of CN112948160B publication Critical patent/CN112948160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1012Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
    • G06F11/1016Error in accessing a memory location, i.e. addressing error

Abstract

The invention provides a positioning and repairing method for an ECC problem of a memory, which comprises the following steps: the method comprises the following steps that a client server to be tested is in communication connection with a server, and the server controls a plurality of client servers to be tested to perform memory ECC problem testing; the method comprises the steps that a server-side server obtains a test log of a client-side server to be tested, and the memory ECC problem is divided into a recoverable ECC problem and a non-recoverable ECC problem according to the test log; the invention also provides a device for positioning and repairing the ECC problem of the memory, which realizes the repair of the memory bank with the repairable ECC problem by modifying the BIOS option and improves the quality of the memory of the server.

Description

Method and device for positioning and repairing memory ECC problem
Technical Field
The present invention relates to the field of memory ECC problem, and in particular, to a method and an apparatus for locating and repairing a memory ECC problem.
Background
In the current server field, the use of ramos (memory operating system, i.e. diskless system) is becoming more and more common, especially in the areas of development, testing, production, etc. Due to the particularity of ramos, problems or probabilistic events occurring in the process of pressure testing are difficult to locate, and even if the problems are found, time is needed to analyze the hardware position of the corresponding server; the recurring problems take up a great deal of time and labor.
In the prior art, if an ECC (Error Correcting Code) problem occurs, if a machine is not down, a slot corresponding to the ECC problem may be located by directly capturing log information; if the machine crashes, the problem needs to be reproduced, and log information is captured in real time in a serial port line connection mode.
However, log information is captured in a serial port line mode, and the occupied time is long; in the mode of directly capturing the log, the time for reproducing the problem is long, and the problem is difficult to reproduce in the case of a probabilistic problem, but the probability still exists; after the existing memory ECC problem is located, only the memory with the ECC problem can be shielded, the memory with the ECC problem cannot be repaired, and the problem of the server memory is not solved favorably.
Disclosure of Invention
In order to solve the problems in the prior art, the invention innovatively provides a method and a device for positioning and repairing the ECC problem of the memory, so that the memory bank with the repairable ECC problem is repaired, the quality of the memory of the server is improved, and the reliability of the ECC problem test of the memory of the server is effectively improved.
The first aspect of the present invention provides a method for locating and repairing an ECC problem in a memory, including:
the method comprises the following steps that a client server to be tested is in communication connection with a server, and the server controls a plurality of client servers to be tested to perform memory ECC problem testing;
the method comprises the steps that a server-side server obtains a test log of a client-side server to be tested, and divides an internal memory ECC problem into a repairable ECC problem and an unrepairable ECC problem according to the test log;
and positioning the memory capable of repairing the ECC problem, counting the error reporting times of the memory, and automatically repairing the positioned memory bank capable of repairing the ECC problem by modifying the BIOS option if the error reporting times of the memory exceed a preset value.
Optionally, the communication connection between the client server to be tested and the server, where the step of controlling, by the server, the multiple client servers to be tested to perform the memory ECC problem test specifically includes:
building a network test environment, and accessing each client server to be tested and a server to the same switch, wherein each client server to be tested and the server are in the same network segment;
configuring an operating system and a kernel in a server-side server, and establishing connection between each client-side server to be tested and the server-side server through PXE (PCI extensions for instrumentation) guidance;
each client server is started up and started up automatically, the actual use scene of a user is simulated, and the memory ECC problem test is carried out;
in the testing process, if the memory mce error occurs, the testing is terminated, and a testing log is recorded.
Optionally, the dividing the memory ECC problem into the recoverable ECC problem and the unrepairable ECC problem according to the test log specifically includes:
detecting whether a repairable flag field exists in the test log, if so, determining that the ECC problem is a repairable ECC problem;
and detecting whether an uncorrectable flag field exists in the test log, and if so, determining that the ECC problem is an uncorrectable ECC problem.
Further, the repairable flag field is 0xa0, and the unrepairable flag field is 0xa1.
Optionally, the automatically repairing the located memory bank which can repair the ECC problem by modifying the BIOS option specifically includes:
using a BIOS tool to export BIOS options, and modifying the memory enhancement test options in the BIOS options into test repair options;
and after the modification is finished, the BIOS is introduced again, and after the BIOS option is confirmed to be successfully modified, the server is restarted to automatically repair the BIOS.
Further, the Memory enhancement Test option is Enhanced Memory Test, and the Test Repair option is Test and Repair.
Optionally, the BIOS supports memory enhancement functions.
Optionally, the method further comprises: and if the error reporting times of the memory do not exceed the preset value, restarting the client server and carrying out the memory ECC problem test again.
Optionally, the method further comprises: and positioning the memory with the uncorrectable ECC problem and analyzing the cause of the problem.
The second aspect of the present invention provides a positioning and repairing apparatus for memory ECC problem, comprising:
the test module is used for connecting the client server to be tested with the server in a communication way, and the server controls a plurality of client servers to be tested to test the memory ECC problem;
the system comprises a dividing module, a server side server and a server side server, wherein the server side server acquires a test log of a client side server to be tested, and divides the memory ECC problem into a recoverable ECC problem and an unrepairable ECC problem according to the test log;
and the positioning and repairing module is used for positioning the memory capable of repairing the ECC problem, counting the error reporting times of the memory, and automatically repairing the positioned memory bank capable of repairing the ECC problem by modifying the BIOS option if the error reporting times of the memory exceed a preset value.
The technical scheme adopted by the invention comprises the following technical effects:
1. the ECC problem generated in the memory test is classified into the repairable ECC problem and the unrepairable ECC problem, the memory bank with the repairable ECC problem is repaired by modifying the BIOS option, the quality of the server memory is improved, and the reliability of the server memory ECC problem test is effectively improved.
2. According to the technical scheme, the memory bank with ECC problems can be automatically repaired in the restarting process of the server by modifying the BIOS option, and the repairing efficiency of the memory bank is improved.
3. In the technical scheme of the invention, if the error reporting times of the memory do not exceed the preset value, the client server is restarted, and the memory ECC problem test is carried out again, so that the error positioning caused by the ECC problem not caused by the memory is avoided, and the reliability of the memory ECC problem test is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the embodiments or technical solutions in the prior art are briefly described below, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of step S1 in one embodiment of the method of the present invention;
FIG. 3 is another schematic flow diagram of a process in accordance with an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a second apparatus according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the present invention will be explained in detail by the following embodiments and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example one
As shown in fig. 1, the present invention provides a method for locating and repairing an ECC problem in a memory, including:
s1, a client server to be tested is in communication connection with a server, and the server controls a plurality of client servers to be tested to perform memory ECC problem testing;
s2, the server side server obtains a test log of the client side server to be tested, and divides the memory ECC problem into a repairable ECC problem and an unrepairable ECC problem according to the test log;
s3, positioning the memory capable of repairing the ECC problem, and counting the error reporting times of the memory;
s4, judging whether the counted times exceed a preset value or not, and if so, executing the step S5;
s5, automatically repairing the positioned memory bank which can repair the ECC problem by modifying the BIOS option;
and S6, restarting the client server and carrying out the memory ECC problem test again.
In step S1, as shown in fig. 2, step S1 specifically includes:
s11, building a network test environment, and accessing each client server to be tested and a server into the same switch, wherein each client server to be tested and the server are in the same network segment;
s12, configuring an operating system and a kernel in the server-side server, and establishing connection between each client-side server to be tested and the server-side server through PXE (PCI extensions for instrumentation) guidance;
s13, each client server is started up automatically, the actual use scene of a user is simulated, and the memory ECC problem test is carried out;
and S14, in the testing process, if the memory mce error occurs, the testing is terminated, and a testing log is recorded.
In step S11, the client server and the server to be tested are in the same network segment, and the client server and the server to be tested are in the same network segment by automatically allocating an ip address, a subnet mask, a broadcast address, and the like.
In step S12, an Operating System (OS), a kernel, and a boot file (mac) are configured in the server, and each client server to be tested is booted in a PXE (Preboot eXecution Environment) manner to establish a connection with the server.
In step S13, the boot self-starting may be implemented by a program, and the program may specifically be an etc/rc. After the startup self-starting setting is completed, the client server to be tested automatically enters an operating system after being started or restarted, and a test program is automatically or manually operated to simulate the actual use scene of a user.
In step S14, in the testing process, if an error occurs in the memory mc (Machine Check Exception, an Exception triggered when the CPU finds a hardware error), the testing is terminated, and a testing log is recorded through the messages, i.e., a self-contained log file under the linux system/var/log directory, and a system testing log is recorded.
In step S2, the client server to be tested returns the test log to the server in an NTFS (log file system) manner, and the server obtains the test log of the client server to be tested and saves and exports a sel log (a part of the test log) thereof by an ipmitool tool; if the state is the starting state, the state can be stored in a mode of a first command ipmitool sel save. If the down state is detected, the down state can be saved by a second command, I, limit-I, lan plus-H, ip-U, user-Password sel.
The sel log is a system event log (system event log) and is obtained in an ipmitool mode, wherein the ipmitool is a management tool under linux; exceptions in the test process are logged in the sel log, so sel is the key log for system trigger logging. Log is a way to save and view sel log locally; in the second command, ipmitool-I samples-H is a system ip address, -U is a user name, -P is a password, and sel save sel.
The dividing of the memory ECC problem into a recoverable ECC problem and an unrepairable ECC problem according to the test log is specifically:
detecting whether a repairable flag field exists in the test log, if so, determining that the ECC problem is a repairable ECC problem;
and detecting whether an uncorrectable flag field exists in the test log, and if so, determining that the ECC problem is an uncorrectable ECC problem.
Log may distinguish between a repairable ECC issue and a non-repairable ECC issue by the fifth field in sel. The repairable flag field is 0xa0, and the unrepairable flag field is 0xa1.
In step S3, the memory capable of repairing the ECC problem is located, and the number of times of error reporting of the memory is counted, where the memory capable of repairing the ECC problem is located to a specific memory bank (dimm), and the specific implementation manner may be a third command location lighting, for example: ipmitool raw 0x3a0xb1 dimm 1; if the system is in the downtime state, a fourth command can be sent to position and light the lamp: ipsmool-I lan plus-H ip-U user-P password raw 0x3a0xb1 dimm 1.
In steps S4 to S6, it is determined whether the number of times of memory error reporting exceeds a preset value, and for a recoverable ECC problem: if the memory error reporting times do not exceed the preset value, the memory error reporting may be caused by other reasons (not the reason of the memory bank), the client server system is restarted to test whether the memory error reporting occurs again, and if the memory error reporting does not occur, the recoverable ECC problem is ignored; if so, the memory bank in which the ECC-repairable problem occurs can be repaired by modifying the BIOS option. If the error reporting times of the Memory exceed a preset value, a BIOS option can be exported by using a BIOS tool, and an Enhanced Memory Test (Enhanced Memory Test) in the BIOS option is modified into a Test and Repair (Test and Repair) option; and after the modification is finished, the BIOS is introduced again, the server is restarted after the BIOS option is successfully modified, and automatic repair can be carried out in the restarting process of the server. After the repair is completed, the Operating System (OS) is automatically entered.
Specifically, the BIOS tool may be a SCELNX _64 tool, or other tools, and the invention is not limited thereto. The BIOS (Basic Input Output System ) needs to support the memory enhancement function.
As shown in fig. 3, the method for locating and repairing the ECC problem in the memory according to the present invention further includes:
and S7, positioning the memory with the unrepairable ECC problem, and analyzing the cause of the problem.
In step S7, the memory with the uncorrectable ECC problem is located, which part is caused by the memory is determined, and after the corresponding hardware is replaced, retesting is performed to determine the cause of the uncorrectable ECC problem.
It should be noted that, in the technical solution of the present invention, steps S1 to S7 can all be implemented by programming in a programming language, and the programming idea corresponds to the steps of the present invention, and can also be implemented in other ways, and the present invention is not limited herein.
The ECC problem generated in the memory test is classified into the recoverable ECC problem and the unrepairable ECC problem, the memory bank with the recoverable ECC problem is repaired by modifying the BIOS option, the quality of the server memory is improved, and the reliability of the ECC problem test of the server memory is effectively improved.
According to the technical scheme, the memory bank with ECC problems can be automatically repaired in the restarting process of the server by modifying the BIOS option, and the repairing efficiency of the memory bank is improved.
According to the technical scheme, if the error reporting times of the memory do not exceed the preset value, the client server is restarted, the memory ECC problem test is carried out again, error positioning caused by the ECC problem not caused by the memory is avoided, and the reliability of the memory ECC problem test is improved.
Example two
As shown in fig. 4, the technical solution of the present invention further provides a device for locating and repairing an ECC problem in a memory, including:
the testing module 101 is used for connecting the client server to be tested with the server in a communication way, and the server controls a plurality of client servers to be tested to test the memory ECC problem;
the dividing module 102 is used for the server to obtain a test log of the client server to be tested, and dividing the memory ECC problem into a recoverable ECC problem and an unrepairable ECC problem according to the test log;
and the positioning and repairing module 103 is used for positioning the memory capable of repairing the ECC problem, counting the error reporting times of the memory, and automatically repairing the positioned memory bank capable of repairing the ECC problem by modifying the BIOS option if the error reporting times of the memory exceed a preset value.
The ECC problem generated in the memory test is classified into the recoverable ECC problem and the unrepairable ECC problem, the memory bank with the recoverable ECC problem is repaired by modifying the BIOS option, the quality of the server memory is improved, and the reliability of the ECC problem test of the server memory is effectively improved.
According to the technical scheme, the BIOS option is modified, the memory bank with ECC problems can be automatically repaired in the restarting process of the server, and the efficiency of repairing the memory bank is improved.
According to the technical scheme, if the error reporting times of the memory do not exceed the preset value, the client server is restarted, the memory ECC problem test is carried out again, error positioning caused by the ECC problem not caused by the memory is avoided, and the reliability of the memory ECC problem test is improved.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (9)

1. A method for locating and repairing memory ECC problems is characterized by comprising the following steps:
the method comprises the following steps that a client server to be tested is in communication connection with a server, and the server controls a plurality of client servers to be tested to perform memory ECC problem testing;
the method comprises the steps that a server-side server obtains a test log of a client-side server to be tested, and the memory ECC problem is divided into a recoverable ECC problem and a non-recoverable ECC problem according to the test log;
positioning a memory capable of repairing the ECC problem, counting the error reporting times of the memory, and automatically repairing the positioned memory bank capable of repairing the ECC problem by modifying the BIOS option if the error reporting times of the memory exceed a preset value; wherein, through modifying the BIOS option, automatically repairing the located memory bank that can repair the ECC problem specifically includes:
using a BIOS tool to export BIOS options, and modifying the memory enhancement test options in the BIOS options into test repair options;
and after the modification is finished, the BIOS is introduced again, and after the BIOS option is confirmed to be successfully modified, the server is restarted to automatically repair the BIOS.
2. The method of claim 1, wherein the step of communicatively connecting the client server to be tested with the server, the step of controlling the plurality of client servers to be tested by the server to perform the testing of the memory ECC problem specifically comprises:
building a network test environment, and accessing each client server to be tested and a server to the same switch, wherein each client server to be tested and the server are in the same network segment;
configuring an operating system and a kernel in a server-side server, and establishing connection between each client-side server to be tested and the server-side server through PXE (PCI extensions for instrumentation) guidance;
each client server is respectively started up and started up, the actual use scene of a user is simulated, and the memory ECC problem test is carried out;
in the testing process, if the memory mce error occurs, the testing is terminated, and a testing log is recorded.
3. The method of claim 1, wherein the dividing of the memory ECC problem into repairable ECC problem and unrepairable ECC problem according to the test log comprises:
detecting whether a repairable flag field exists in the test log, if so, determining that the ECC problem is a repairable ECC problem;
and detecting whether an uncorrectable flag field exists in the test log, and if so, determining that the ECC problem is an uncorrectable ECC problem.
4. The method of claim 3, wherein the repairable flag field is 0xa0 and the non-repairable flag field is 0xa1.
5. The method for locating and repairing the memory ECC problem of claim 1, wherein the memory enhancement Test option is EnhancedMemoryTest, and the Test Repair option is Test and Repair.
6. The method of claim 1, wherein the BIOS supports memory enhancement.
7. The method of claim 1, further comprising: and if the error reporting times of the memory do not exceed the preset value, restarting the client server and carrying out the memory ECC problem test again.
8. The method of claim 1, further comprising: and positioning the memory with the uncorrectable ECC problem and analyzing the cause of the problem.
9. A positioning repair device for memory ECC problem is characterized by comprising:
the test module is used for connecting the client server to be tested with the server in a communication way, and the server controls a plurality of client servers to be tested to test the memory ECC problem;
the server side server acquires a test log of the client side server to be tested, and divides the memory ECC problem into a repairable ECC problem and an unrepairable ECC problem according to the test log;
the positioning and repairing module is used for positioning the memory capable of repairing the ECC problem, counting the error reporting times of the memory, and automatically repairing the positioned memory bank capable of repairing the ECC problem by modifying the BIOS option if the error reporting times of the memory exceed a preset value; wherein, through modifying the BIOS option, automatically repairing the located memory bank that can repair the ECC problem specifically includes:
using a BIOS tool to derive a BIOS option, and modifying a memory enhancement test option in the BIOS option into a test repair option;
and after the modification is finished, the BIOS is introduced again, and after the BIOS option is confirmed to be successfully modified, the server is restarted to automatically repair the BIOS.
CN202110219990.8A 2021-02-26 2021-02-26 Method and device for positioning and repairing memory ECC problem Active CN112948160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110219990.8A CN112948160B (en) 2021-02-26 2021-02-26 Method and device for positioning and repairing memory ECC problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110219990.8A CN112948160B (en) 2021-02-26 2021-02-26 Method and device for positioning and repairing memory ECC problem

Publications (2)

Publication Number Publication Date
CN112948160A CN112948160A (en) 2021-06-11
CN112948160B true CN112948160B (en) 2023-02-28

Family

ID=76246572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110219990.8A Active CN112948160B (en) 2021-02-26 2021-02-26 Method and device for positioning and repairing memory ECC problem

Country Status (1)

Country Link
CN (1) CN112948160B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218275A (en) * 2013-03-28 2013-07-24 华为技术有限公司 Data error repairing method, device and equipment
WO2016106965A1 (en) * 2014-12-31 2016-07-07 中兴通讯股份有限公司 Server self-healing method and device
CN109101383A (en) * 2018-08-09 2018-12-28 郑州云海信息技术有限公司 A kind of test method and system of memory detection
WO2020015203A1 (en) * 2018-07-20 2020-01-23 华为技术有限公司 System recovery method and device
CN112131039A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Memory ECC information reporting control method, device, equipment and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100498715C (en) * 2006-05-20 2009-06-10 技嘉科技股份有限公司 Method for simulating IPMI by BIOS
TWI685751B (en) * 2018-04-10 2020-02-21 神雲科技股份有限公司 Error reporting function control method for server device
US10783025B2 (en) * 2018-10-15 2020-09-22 Dell Products, L.P. Method and apparatus for predictive failure handling of interleaved dual in-line memory modules
TWI709039B (en) * 2019-04-25 2020-11-01 神雲科技股份有限公司 Server and method for controlling error event log recording
CN110489259B (en) * 2019-07-29 2023-03-24 深圳中电长城信息安全系统有限公司 Memory fault detection method and equipment
CN111124722B (en) * 2019-10-30 2022-11-29 苏州浪潮智能科技有限公司 Method, equipment and medium for isolating fault memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218275A (en) * 2013-03-28 2013-07-24 华为技术有限公司 Data error repairing method, device and equipment
WO2016106965A1 (en) * 2014-12-31 2016-07-07 中兴通讯股份有限公司 Server self-healing method and device
WO2020015203A1 (en) * 2018-07-20 2020-01-23 华为技术有限公司 System recovery method and device
CN109101383A (en) * 2018-08-09 2018-12-28 郑州云海信息技术有限公司 A kind of test method and system of memory detection
CN112131039A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Memory ECC information reporting control method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
电脑内存最新优化与维护技巧及常见问题排解;未知;《电脑编程技巧与维护》;20090603(第11期);第92页-93页 *

Also Published As

Publication number Publication date
CN112948160A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN109510742B (en) Server network card remote test method, device, terminal and storage medium
CN110750396B (en) Server operating system compatibility testing method and device and storage medium
CN110162435B (en) Method, system, terminal and storage medium for starting and testing PXE of server
CN109587331B (en) Method and system for automatically repairing cloud mobile phone fault
CN113312064B (en) Method and device for installing and configuring physical machine and computer readable medium
CN109698772B (en) Method and system for verifying PXE (PCI extensions for instrumentation) functional stability of network card
CN105743707A (en) Method for testing BMC log analyzing function based on Redhat system
CN112948160B (en) Method and device for positioning and repairing memory ECC problem
CN111181808B (en) Method and system for testing BMC IP access control
CN114691473A (en) Test method, test device and electronic equipment
CN111459507B (en) Linux OS filling method and device based on hard disk drive name
CN108989144B (en) Method and system for testing NAS cluster reset under Linux
CN114443295A (en) Heterogeneous cloud resource management scheduling method, device and system
CN114153503A (en) BIOS control method, device and medium
CN113657702A (en) Automatic operation and maintenance method and device for internet data center and readable storage medium
CN111984372A (en) Data processing method and system
CN110990181A (en) Method and system for automatically reproducing probabilistic setting failure of options in BIOS (basic input/output System)
CN108769246B (en) NFS sharing maximization test method and system
CN111400113A (en) Complete machine self-checking method, device and system of computer system
CN114510460A (en) Database system capacity expansion method and device, electronic equipment and storage medium
CN113507388B (en) Equipment deployment configuration method and device, electronic equipment and storage medium
CN113900934B (en) Multi-mirror hybrid refresh test method, system, terminal and storage medium
CN109343988A (en) A kind of test can not correct mistake the method for being downgraded into and can correcting mistake function
CN114297009B (en) Method and device for testing server and electronic equipment
CN113868128A (en) Method, device and storage medium for testing PXE boot function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant