CN105204968B - A kind of failure memory detection method and device - Google Patents

A kind of failure memory detection method and device Download PDF

Info

Publication number
CN105204968B
CN105204968B CN201510763358.4A CN201510763358A CN105204968B CN 105204968 B CN105204968 B CN 105204968B CN 201510763358 A CN201510763358 A CN 201510763358A CN 105204968 B CN105204968 B CN 105204968B
Authority
CN
China
Prior art keywords
memory
fault
physical address
information
pci
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510763358.4A
Other languages
Chinese (zh)
Other versions
CN105204968A (en
Inventor
常现超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510763358.4A priority Critical patent/CN105204968B/en
Publication of CN105204968A publication Critical patent/CN105204968A/en
Application granted granted Critical
Publication of CN105204968B publication Critical patent/CN105204968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the present invention provides a kind of failure memory detection method and device, wherein method includes: real-time monitoring memory operating status, when detecting that memory breaks down, generate the fault message including failure memory physical address, obtain the fault message, the physical address of failure memory is obtained according to the fault message, it is obtained by system kernel and all defers to PCI standard, the slot of PCI-X or PCI-E standard, parsing obtains all operation informations for being placed in memory on slot, all physical address variation ranges for being placed in memory on slot are obtained according to the operation information, according to the physical address of failure memory and all physical address variation ranges for being placed in memory on slot, positioning obtains failure memory.It ensure that lookup result correctness, meanwhile, there is higher working efficiency, it is enabled to find and effectively handle memory failure in time, it reduces because memory failure causes damages to application service, improves the stabilization and reliability of system.

Description

Fault memory detection method and device
Technical Field
The invention relates to the field of computer application, in particular to a fault memory detection method and device.
Background
With the rapid development of computer technology and integrated circuit technology, computers are rapidly improved from software or hardware. Due to the increase of computer hardware, the failure rate of the computer hardware is also improved, especially in the aspect of memory, the current application program has an increasing demand on the memory in order to improve the performance, the number of memory banks inserted into the computer is increased, and the failure probability of the memory is greatly improved. If one of the memory banks in a group fails, the failed memory bank may be used by the service program, so that the service becomes unstable, and even data confusion occurs, resulting in huge loss. At present, when a memory fails, memory information is obtained from a database in a manual mode, the obtained memory information is analyzed, and finally the failed memory is searched.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for detecting a faulty memory, so as to solve the problems in the prior art that when a memory fails, the accuracy of the analyzed faulty memory cannot be guaranteed, the efficiency of obtaining the faulty memory is also low, so that the memory failure cannot be timely and effectively handled, the application service is damaged, and the stability and reliability of the system are seriously affected, because the memory information is obtained from a database in a manual manner, the obtained memory information is analyzed, and the finally the faulty memory is searched.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a method for detecting a fault memory comprises the following steps:
monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, acquiring the fault information, and obtaining the physical address of the fault memory according to the fault information;
acquiring all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and acquiring all running information of memories arranged on the slots, and acquiring all physical address change ranges of the memories arranged on the slots;
and positioning to obtain the fault memory according to the physical address of the fault memory and the physical address change range of all memories arranged on the slot.
Wherein, the positioning further comprises, after obtaining the fault memory:
and carrying out logic off-line operation on the fault memory, and migrating the data in the fault memory to other normal operation memories.
When the fault memory detection method is used for a Linux system, the memory running state is monitored in real time through an mcelog program in the Linux system, and when a fault occurs in the memory, fault information including a physical address of the fault memory is generated through the mcelog program.
Wherein, after generating the failure information including the physical address of the failed memory, the method includes: and saving the fault information in a register.
Wherein the acquiring the fault information comprises:
judging that the register stores fault information;
and if the fault information is stored, acquiring the fault information stored in the register.
Wherein, the obtaining all slots complying with the PCI standard, the PCI-X standard or the PCI-E standard by the system kernel further comprises:
acquiring operation information of all the slots, and determining the currently-used slot in all the slots;
analyzing and acquiring all the operation information of the memories arranged on the slot in operation to obtain all the physical address change ranges of the memories arranged on the slot in operation.
Wherein, the positioning further comprises, after obtaining the fault memory:
an alarm is issued and a log file is generated, wherein the alarm is an audible alarm and/or a flashing light alarm.
A failing memory detection device, comprising: the device comprises a monitoring acquisition module, a slot acquisition unit and a positioning unit; wherein,
the monitoring and acquiring unit is used for monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the memory is detected to be in fault, acquiring the fault information and acquiring the physical address of the fault memory according to the fault information;
the slot obtaining unit is used for obtaining all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and obtaining operation information of all memories arranged on the slots, and obtaining physical address change ranges of all the memories arranged on the slots;
and the positioning unit is used for positioning to obtain the fault memory according to the physical address of the fault memory and the physical address change range of all memories arranged on the slot.
Wherein, the failure memory detection device further comprises: and the migration module is used for performing logic off-line operation on the fault memory and migrating the data in the fault memory to other normal operation memories.
Wherein, the failure memory detection device further comprises: and the storage module is used for storing the fault information in a register after generating the fault information comprising the physical address of the fault memory.
Based on the above technical solution, the method and the device for detecting a faulty memory provided in the embodiments of the present invention monitor the operating state of the memory in real time, generate fault information including a physical address of the faulty memory when a fault occurs in the memory is detected, obtain the physical address of the faulty memory according to the fault information by obtaining the fault information generated when the memory fails, then obtain all slots complying with the PCI standard, PCI-X standard, or PCI-E standard through a system kernel, analyze and obtain all the operating information of the memory disposed on the slot, obtain all the physical address change ranges of the memory disposed on the slot, and locate and obtain the faulty memory according to the obtained physical address of the faulty memory and all the physical address change ranges of the memory disposed on the slot. The method comprises the steps of monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, locating the fault memory according to the physical address in the generated fault information and the change range of the physical addresses of the memories arranged on the slots, acquiring the memory information from a database in a manual mode, analyzing the acquired memory information, and finally searching the fault memory.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting a faulty memory according to an embodiment of the present invention;
fig. 2 is another flowchart of a method for detecting a faulty memory according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for obtaining failure information in a failure memory detection method according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating a method for obtaining a physical address variation range of a memory disposed on a slot in a method for detecting a faulty memory according to an embodiment of the present invention;
fig. 5 is a system block diagram of a failure memory detection apparatus according to an embodiment of the present invention;
fig. 6 is another system block diagram of a faulty memory detection apparatus according to an embodiment of the present invention;
fig. 7 is a system block diagram of another apparatus for detecting a faulty memory according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for detecting a faulty memory according to an embodiment of the present invention, which monitors an operating state of the memory in real time, generates fault information including a physical address of the faulty memory when a fault occurs in the memory, and locates and obtains the faulty memory according to the physical address in the generated fault information and a variation range of the physical addresses of all memories disposed on a slot, so as to very accurately obtain a physical location of the faulty memory, ensure correctness of a search result, and have higher working efficiency, so that a memory fault can be timely found and effectively handled, damage to an application service due to the memory fault is reduced, and stability and reliability of a system are improved; referring to fig. 1, the method for detecting a faulty memory may include:
step S100: monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, acquiring the fault information, and obtaining the physical address of the fault memory according to the fault information;
optionally, when the method for detecting a fault memory provided in the embodiment of the present invention is used in a Linux system, the memory operation state may be monitored in real time through an mcelog program in the Linux system, and when a fault occurs in the memory, fault information including a physical address of the fault memory is generated by the mcelog program.
Optionally, the failure information including the physical address of the failed memory is generated, and the failure information is further stored in a register.
Alternatively, when it is determined that the fault information is stored in the register, the fault information stored in the register may be acquired.
Step S110: acquiring all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and acquiring all running information of memories arranged on the slots, and acquiring all physical address change ranges of the memories arranged on the slots;
optionally, after all slots complying with the PCI standard, the PCI-X standard, or the PCI-E standard are obtained by the system kernel, the operation information of all slots is obtained, the currently-used slot in all slots is determined, that is, the currently-used slot is located in the currently-used slot, only the operation information of the memory located in the currently-used slot is obtained by analysis, the physical address change range of the memory located in the currently-used slot is obtained, and the useless data of the physical address change range can be avoided.
Step S120: and positioning to obtain the fault memory according to the physical address of the fault memory and the physical address change range of all memories arranged on the slot.
When the fault memory is located and obtained, the slot into which the fault memory is inserted can be obtained, that is, obtaining the fault storage is the physical position of the fault memory.
Optionally, after the fault memory is located, the located fault memory may be subjected to a logic offline operation, and data in the fault memory is migrated to another normal operating memory, so that the data is no longer used by the service program and the operating system, and normal operation of the system is ensured.
Optionally, after the located fault memory is logically offline operated, the fault memory may be replaced, and the replaced normal memory is online operated at any time after the fault inner side is replaced.
Optionally, if the operating information of all the slots is obtained after all the slots complying with the PCI standard, the PCI-X standard, or the PCI-E standard are obtained by the system kernel, and the currently-used slot in all the slots is determined, the faulty memory can be located and obtained only according to the obtained physical address of the faulty memory and the variation range of the physical address of the memory placed in the currently-used slot.
Optionally, after the fault memory is located, an alarm may be sent out, and a log file may be generated.
Optionally, after the fault memory is located, the alarm sent out may be a sound alarm, a flash alarm, or a combination of the sound alarm and the flash alarm.
Based on the above technical solution, the method for detecting a faulty memory according to the embodiments of the present invention monitors the operating state of the memory in real time, generates fault information including a physical address of the faulty memory when a fault occurs in the memory is detected, obtains the physical address of the faulty memory according to the fault information by obtaining the fault information generated when the memory is faulty, obtains all slots complying with the PCI standard, PCI-X standard, or PCI-E standard by using a system kernel, obtains all operating information of the memory placed in the slots by parsing, obtains a physical address change range of the memory placed in the slots, and obtains the faulty memory by locating according to the obtained physical address of the faulty memory and the physical address change range of the memory placed in the slots. The method comprises the steps of monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, locating the fault memory according to the physical address in the generated fault information and the change range of the physical addresses of the memories arranged on the slots, acquiring the memory information from a database in a manual mode, analyzing the acquired memory information, and finally searching the fault memory.
Optionally, fig. 2 shows another flowchart of a method for detecting a faulty memory according to an embodiment of the present invention, and referring to fig. 2, the method for detecting a faulty memory may include:
step S200: monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, acquiring the fault information, and obtaining the physical address of the fault memory according to the fault information;
step S210: acquiring all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and acquiring all running information of memories arranged on the slots, and acquiring all physical address change ranges of the memories arranged on the slots;
step S220: positioning to obtain the fault memory according to the physical address of the fault memory and the physical address variation range of the memories arranged on the slot,
step S230: and carrying out logic off-line operation on the fault memory, and migrating the data in the fault memory to other normal operation memories.
After the fault memory is located, the logic off-line operation can be performed on the located fault memory, and the data in the fault memory is migrated to other normal operation memories, so that the data is not used by a service program and an operating system any more, and the normal operation of the system is ensured.
Optionally, after the located fault memory is logically offline operated, the fault memory may be replaced, and the replaced normal memory is online operated at any time after the fault inner side is replaced.
Optionally, fig. 3 shows a flowchart of a method for obtaining fault information in the fault memory detection method provided in the embodiment of the present invention, and referring to fig. 3, the method for obtaining fault information may include:
step S300: storing the fault information in a register;
step S310: judging that the register stores fault information;
if the fault information including the fault memory physical address is stored in the register after the fault information is generated, whether the memory fault occurs in the system can be judged by judging whether the fault information is stored in the register.
Step S320: and if the fault information is stored, acquiring the fault information stored in the register.
If the fault information is stored in the register, the system is judged to have a memory fault, the fault information stored in the register is obtained, then the physical address of the fault memory is obtained according to the fault information, and the follow-up operation is continued; if the fault information is not stored in the register, the system is judged to have no memory fault, and the memory running state is continuously monitored.
Optionally, fig. 4 shows a flowchart of a method for obtaining a physical address variation range of a memory disposed on a slot in a method for detecting a faulty memory according to an embodiment of the present invention, and referring to fig. 4, the method for obtaining the physical address variation range of the memory disposed on the slot may include:
step S400: acquiring all slots complying with PCI standard, PCI-X or PCI-E standard through a system kernel;
the PCI (Peripheral Component Interconnect) standard is a standard for defining a local bus, which was introduced by Intel (Intel) corporation in 1991, and a slot using the PCI standard transmits data using a width of 32 bits.
The PCI-X interface is a newer version of a PCI bus connected in parallel, with slots using the PCI-X standard using a 64-bit width to transfer data.
The PCI-E (PCI-Express) standard is a standard third generation I/O (input/output) bus technology standard developed by Intel (Intel) corporation for defining local buses. Slots using the PCI-E standard differ according to the width of the bus bits.
Step S410: acquiring operation information of all the slots, and determining the currently-used slot in all the slots;
the memory is not inserted into all the slots, and not all the memory inserted into the slots is used at any time, so after all the slots are obtained, the running slots currently used in all the slots can be determined through the running information by obtaining the running information of all the slots, and the memory arranged on the running slots is the memory which is running when the memory fault occurs.
Step S420: analyzing and acquiring all the operation information of the memories arranged on the slot in operation to obtain all the physical address change ranges of the memories arranged on the slot in operation.
The failure memory can be detected only when the memory fails to operate, so that the failure memory is always in the operating memory, the operation information of all the memories arranged on the operating slot can be obtained only by analyzing, the physical address change range of all the memories arranged on the operating slot can be obtained, useless data of the physical address conversion range can be avoided, and finally the failure memory can be obtained by positioning according to the obtained physical address of the failure memory and the physical address change range of all the memories arranged on the operating slot.
The fault memory detection method provided by the embodiment of the invention monitors the running state of the memory in real time, generates fault information comprising the physical address of the fault memory when the memory is detected to have a fault, and locates and obtains the fault memory according to the physical address in the generated fault information and the change range of the physical addresses of all the memories arranged on the slots, thereby ensuring the correctness of a search result, having higher working efficiency, timely finding and effectively processing the memory fault, reducing the damage to application service caused by the memory fault, and improving the stability and reliability of the system.
In the following, the fault memory detection device provided by the embodiment of the present invention is introduced, and the fault memory detection device described below and the fault memory detection method described above may be referred to correspondingly.
Fig. 5 is a system block diagram of a faulty memory detection apparatus according to an embodiment of the present invention, and referring to fig. 5, the faulty memory detection apparatus may include: a monitoring acquisition module 100, a slot acquisition unit 200 and a positioning unit 300; wherein,
the monitoring and acquiring unit 100 is configured to monitor an operating state of a memory in real time, generate fault information including a physical address of the fault memory when a fault occurs in the memory, acquire the fault information, and obtain the physical address of the fault memory according to the fault information;
a slot obtaining unit 200, configured to obtain, through a system kernel, all slots complying with the PCI standard, the PCI-X standard, or the PCI-E standard, analyze and obtain all operation information of the memories disposed in the slots, and obtain all physical address variation ranges of the memories disposed in the slots;
and a positioning unit 300, configured to obtain the faulty memory by positioning according to the physical address of the faulty memory and the physical address variation range of all memories disposed on the slot.
Optionally, fig. 6 shows another system block diagram of the faulty memory detection apparatus according to the embodiment of the present invention, and referring to fig. 6, the faulty memory detection apparatus may include: the module 400 is migrated.
The migration module 400 is configured to perform a logic offline operation on the faulty memory, and migrate data in the faulty memory to other normally operating memories.
Optionally, fig. 7 shows another system block diagram of the faulty memory detection apparatus according to the embodiment of the present invention, and referring to fig. 7, the faulty memory detection apparatus may include: a memory module 500.
The storage module 500 is configured to generate failure information including a physical address of a failed memory, and store the failure information in a register.
The fault memory detection device provided by the embodiment of the invention monitors the running state of the memory in real time, generates fault information comprising the physical address of the fault memory when the memory is detected to have a fault, and locates and obtains the fault memory according to the physical address in the generated fault information and the change range of the physical addresses of all the memories arranged on the slots, thereby ensuring the correctness of a search result, having higher working efficiency, timely finding and effectively processing the memory fault, reducing the damage to application service caused by the memory fault, and improving the stability and reliability of the system.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for detecting a fault memory is characterized by comprising the following steps:
monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the fault of the memory is detected, acquiring the fault information, and obtaining the physical address of the fault memory according to the fault information;
acquiring all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and acquiring all running information of memories arranged on the slots, and acquiring all physical address change ranges of the memories arranged on the slots;
and positioning to obtain the fault memory according to the physical address of the fault memory and the physical address change range of all memories arranged on the slot.
2. The method according to claim 1, wherein after the positioning to obtain the fault memory, the method further comprises:
and carrying out logic off-line operation on the fault memory, and migrating the data in the fault memory to other normal operation memories.
3. The method according to claim 1, wherein when the method is applied to a Linux system, the memory running state is monitored in real time by an mcelog program in the Linux system, and when a memory fault is detected, fault information including a physical address of the fault memory is generated by the mcelog program.
4. The method according to claim 1, wherein the generating the failure information including the physical address of the failed memory comprises: and saving the fault information in a register.
5. The method according to claim 4, wherein the acquiring the failure information comprises:
judging that the register stores fault information;
and if the fault information is stored, acquiring the fault information stored in the register.
6. The method according to claim 1, wherein the obtaining, by the system kernel, all slots compliant with the PCI standard, the PCI-X standard, or the PCI-E standard further comprises:
acquiring operation information of all the slots, and determining the currently-used slot in all the slots;
analyzing and acquiring all the operation information of the memories arranged on the slot in operation to obtain all the physical address change ranges of the memories arranged on the slot in operation.
7. The method according to claim 1, wherein after the positioning to obtain the fault memory, the method further comprises:
an alarm is issued and a log file is generated, wherein the alarm is an audible alarm and/or a flashing light alarm.
8. A failing memory detection device, comprising: the device comprises a monitoring acquisition module, a slot acquisition unit and a positioning unit; wherein,
the monitoring and acquiring unit is used for monitoring the running state of the memory in real time, generating fault information comprising a physical address of the fault memory when the memory is detected to be in fault, acquiring the fault information and acquiring the physical address of the fault memory according to the fault information;
the slot obtaining unit is used for obtaining all slots complying with PCI standards, PCI-X standards or PCI-E standards through a system kernel, analyzing and obtaining operation information of all memories arranged on the slots, and obtaining physical address change ranges of all the memories arranged on the slots;
and the positioning unit is used for positioning to obtain the fault memory according to the physical address of the fault memory and the physical address change range of all memories arranged on the slot.
9. The failing memory detection device of claim 8, further comprising: and the migration module is used for performing logic off-line operation on the fault memory and migrating the data in the fault memory to other normal operation memories.
10. The failing memory detection device of claim 8, further comprising: and the storage module is used for storing the fault information in a register after generating the fault information comprising the physical address of the fault memory.
CN201510763358.4A 2015-11-10 2015-11-10 A kind of failure memory detection method and device Active CN105204968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510763358.4A CN105204968B (en) 2015-11-10 2015-11-10 A kind of failure memory detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510763358.4A CN105204968B (en) 2015-11-10 2015-11-10 A kind of failure memory detection method and device

Publications (2)

Publication Number Publication Date
CN105204968A CN105204968A (en) 2015-12-30
CN105204968B true CN105204968B (en) 2019-05-10

Family

ID=54952662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510763358.4A Active CN105204968B (en) 2015-11-10 2015-11-10 A kind of failure memory detection method and device

Country Status (1)

Country Link
CN (1) CN105204968B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055438B (en) * 2016-05-27 2019-12-03 深圳市同泰怡信息技术有限公司 The method and system of memory bar exception on a kind of quick positioning mainboard
CN106126364A (en) * 2016-06-28 2016-11-16 浪潮(北京)电子信息产业有限公司 A kind of fault event memory collection method based on Linux system and system
CN106201750A (en) * 2016-06-28 2016-12-07 浪潮(北京)电子信息产业有限公司 A kind of processing method and processing device based on linux EMS memory error
CN106126368A (en) * 2016-08-22 2016-11-16 浪潮电子信息产业股份有限公司 Method for analyzing memory fault address under LINUX
CN107092549A (en) * 2017-04-26 2017-08-25 郑州云海信息技术有限公司 A kind of automatic monitoring and the instrument and method for parsing memory failure
CN109408273A (en) * 2018-11-13 2019-03-01 郑州云海信息技术有限公司 A kind of failure memory of eliminating is to the method and device of systematic influence
CN115292113B (en) * 2022-09-30 2023-01-06 新华三信息技术有限公司 Method and device for fault detection of internal memory of server and electronic equipment
CN115932532B (en) * 2023-03-09 2023-07-25 长鑫存储技术有限公司 Storage method, device, equipment and medium for physical address of fault storage unit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799506A (en) * 2012-06-29 2012-11-28 浪潮电子信息产业股份有限公司 Method for positioning fault memory
CN103198000A (en) * 2013-04-02 2013-07-10 浪潮电子信息产业股份有限公司 Method for positioning faulted memory in linux system
CN103197999A (en) * 2013-03-22 2013-07-10 北京百度网讯科技有限公司 Method and device for automatically positioning internal memory fault
CN103514068A (en) * 2012-06-28 2014-01-15 北京百度网讯科技有限公司 Method for automatically locating internal storage faults

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514068A (en) * 2012-06-28 2014-01-15 北京百度网讯科技有限公司 Method for automatically locating internal storage faults
CN102799506A (en) * 2012-06-29 2012-11-28 浪潮电子信息产业股份有限公司 Method for positioning fault memory
CN103197999A (en) * 2013-03-22 2013-07-10 北京百度网讯科技有限公司 Method and device for automatically positioning internal memory fault
CN103198000A (en) * 2013-04-02 2013-07-10 浪潮电子信息产业股份有限公司 Method for positioning faulted memory in linux system

Also Published As

Publication number Publication date
CN105204968A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105204968B (en) A kind of failure memory detection method and device
CN109558282B (en) PCIE link detection method, system, electronic equipment and storage medium
CN108388489B (en) Server fault diagnosis method, system, equipment and storage medium
US9104796B2 (en) Correlation of source code with system dump information
US20150113334A1 (en) Determine when an error log was created
CN111414268A (en) Fault processing method and device and server
CN107102929A (en) The detection method and device of failure
CN107908490B (en) Method and system for verifying reliability of GPU (graphics processing Unit) register in server DC (direct Current) test
CN110362435B (en) PCIE fault positioning method, device, equipment and medium for Purley platform server
CN107590017B (en) Detection method and device for electronic equipment
CN115686961A (en) Processor testing method and device and electronic equipment
US20160112285A1 (en) Apparatus and method for detecting abnormal connection
CN110474821B (en) Node fault detection method and device
US20220345360A1 (en) Fault Locating Method, Apparatus And System Based On I2C Communication
US20140033097A1 (en) Method and apparatus of testing a computer program
CN104102563A (en) Method and device for finding MCA (machine check architecture) errors of server system
CN113010341A (en) Method and equipment for positioning fault memory
CN115292113B (en) Method and device for fault detection of internal memory of server and electronic equipment
CN112506693A (en) Method and device for recording abnormal information, storage medium and electronic equipment
CN104182290A (en) Debugging device and debugging method
CN105955864A (en) Power supply fault processing method, power supply module, monitoring management module and server
CN114095394A (en) Network node fault detection method and device, electronic equipment and storage medium
CN111694700A (en) Method, device, terminal and storage medium for monitoring DCPMM memory performance
CN117407207B (en) Memory fault processing method and device, electronic equipment and storage medium
CN115640236B (en) Script quality detection method and computing device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant