CN102609350A - Server memory failure alarm method - Google Patents

Server memory failure alarm method Download PDF

Info

Publication number
CN102609350A
CN102609350A CN2012100332686A CN201210033268A CN102609350A CN 102609350 A CN102609350 A CN 102609350A CN 2012100332686 A CN2012100332686 A CN 2012100332686A CN 201210033268 A CN201210033268 A CN 201210033268A CN 102609350 A CN102609350 A CN 102609350A
Authority
CN
China
Prior art keywords
memory
error
failure
error message
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012100332686A
Other languages
Chinese (zh)
Inventor
平原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN2012100332686A priority Critical patent/CN102609350A/en
Publication of CN102609350A publication Critical patent/CN102609350A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention provides a server memory failure alarm method. A software program is used for recognizing memory error messages at a startup stage of a server system, the messages are transmitted to a management chip embedded into a main board and are judged in a classified manner, and alarm is given according to levels. The system comprises a failure information recognition unit, an error message database and an alarm unit, wherein the failure information recognition unit is used for acquiring the error messages sent out by a memory component in the system to provide evidence for judging memory failure, the error message database is used for collecting and transmitting the memory error messages, and the alarm unit is used for selecting different failure alarm modes according to different error messages. By implementing the memory failure alarm method in the server system, reliability of the system can be enhanced to a large extent, maintenance is facilitated, and the total image of the system is improved.

Description

A kind of server memory fault alarm method
Technical background
In the current server system, prior art only triggers the hardware fault circuit signal through memory part, and the LED that carries through plate carries out indicating fault, and is not enough below this type of design exists:
1, failure message can't record, in case system cut-off, failure message that this start is found will be eliminated;
2, the failure mode that can discern of system is limited: only support the detectable easy bugs information of memory part self, for example the internal memory temperature is too high, Error IO writes down excessive number.But for the error message that for example memory chip generation fault, this type of internal memory setup error memory part self can't detect or report, server system can't produce warning message;
3, can't report to the police according to fault order of severity branch rank.
Summary of the invention
Through software program in server system unloading phase identification EMS memory error information; Information is passed to managing chip on the embedded mainboard classifies and judges and report to the police by rank; System comprises: failure message recognition unit (1), error information data storehouse (2), alarm unit (3), wherein:
Failure message recognition unit (1) is responsible for through obtaining memory part sends in the system error message as the foundation of judging memory failure;
The EMS memory error information of transmitting is responsible for collecting in error information data storehouse (2);
Alarm unit (3) is responsible for judging the different fault alarm mode of selecting according to different error messages;
Alarm flow is following:
System powers on, and whether has historical mistake in the faults information bank, and whether detection failure still exist, wherein:
1) fault exists, and is categorized as different faults to error message, reports to the police in a different manner according to the fault rank according to different faults;
2) if fault does not exist, detect this and start shooting whether internal memory is sent out error message, a) internal memory is sent out error message, and error message is recorded the error information data storehouse, is categorized as different faults to error message, reports to the police according to different faults; B) do not exist internal memory to send out error message, remove the historical data in the error information data storehouse.
Excellent effect of the present invention is: alarm unit is included in the watchdog routine among the BMC with the software process form; Can carry out the fault alarm classification according to the misdata of error information data storehouse record; Through Debug digital lamp, LED lamp or hummer, according to the warning of classifying of the wrong order of severity of different stage.
In server system, implement this type of memory failure alarm method, can improve the reliability function of system to a great extent, maintain easily, promote the overall image of product.
Description of drawings
Fig. 1 is alarm flow figure of the present invention.
Embodiment
With reference to accompanying drawing alarm method of the present invention is done following detailed explanation.
Method of the present invention is that the failure message recognition unit is included among the BIOS with the software process form, when system start-up, whether has EMS memory error information during the historical error message of inquiry error information database and this start.
Software detection through the failure message recognition unit; Not only can identify: (1) memory part self sends the hardware fault circuit signal, can also detect (2) memory chip and produce the error message that fault, this type of memory part of internal memory setup error can't trigger self; Failure message recognition unit (1) can obtain whether to have EMS memory error information in the error information data storehouse (2) or in the current start-up course through monitor channel.
The error information data storehouse is recorded among the Flash in the managing chip (BMC) on the embedded mainboard, and when system ran into outage, error message still can be kept among the Flash can not lose.In start next time, detect the memory failure of finding last time for system.

Claims (1)

1. server memory fault alarm method; It is characterized in that; Through software program in server system unloading phase identification EMS memory error information; Information is passed to managing chip on the embedded mainboard classifies and judges and report to the police by rank that system comprises: failure message recognition unit, error information data storehouse, alarm unit, wherein:
The failure message recognition unit is responsible for through obtaining memory part sends in the system error message as the foundation of judging memory failure;
The EMS memory error information of transmitting is responsible for collecting in the error information data storehouse;
Alarm unit is responsible for judging the different fault alarm mode of selecting according to different error messages;
Alert step is following:
System powers on, and whether has historical mistake in the faults information bank, and whether detection failure still exist, wherein:
1) fault exists, and is categorized as different faults to error message, reports to the police in a different manner according to the fault rank according to different faults;
2) fault does not exist, and detects this and starts shooting whether internal memory is sent out error message, and comprising: a) internal memory is sent out error message, and error message is recorded the error information data storehouse, is categorized as different faults to error message, reports to the police according to different faults; B) do not exist internal memory to send out error message, remove the historical data in the error information data storehouse.
CN2012100332686A 2012-02-15 2012-02-15 Server memory failure alarm method Pending CN102609350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100332686A CN102609350A (en) 2012-02-15 2012-02-15 Server memory failure alarm method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100332686A CN102609350A (en) 2012-02-15 2012-02-15 Server memory failure alarm method

Publications (1)

Publication Number Publication Date
CN102609350A true CN102609350A (en) 2012-07-25

Family

ID=46526740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100332686A Pending CN102609350A (en) 2012-02-15 2012-02-15 Server memory failure alarm method

Country Status (1)

Country Link
CN (1) CN102609350A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019898A (en) * 2012-11-26 2013-04-03 加弘科技咨询(上海)有限公司 Error reporting system for memory module detection and slot position traffic light positioning
CN103077103A (en) * 2013-01-18 2013-05-01 浪潮电子信息产业股份有限公司 Off-line diagnosing method for server faults
CN103500133A (en) * 2013-09-17 2014-01-08 华为技术有限公司 Fault locating method and device
CN103995768A (en) * 2014-06-10 2014-08-20 浪潮电子信息产业股份有限公司 Visual quick diagnosing method of server faults
CN104021054A (en) * 2014-06-11 2014-09-03 浪潮(北京)电子信息产业有限公司 Server fault visual detecting and processing method and system and programmable chip
CN105159813A (en) * 2015-08-05 2015-12-16 北京百度网讯科技有限公司 Data center based fault alarming method, apparatus, management device and system
CN108959025A (en) * 2018-06-27 2018-12-07 郑州云海信息技术有限公司 A kind of server alarm method, device and server
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1570859A (en) * 2003-07-16 2005-01-26 联想(北京)有限公司 Design method for avoiding misuse of non-ECC memory
CN1845029A (en) * 2005-11-11 2006-10-11 南京科远控制工程有限公司 Setting method for fault diagnosis and accident prediction
US20070168738A1 (en) * 2005-12-12 2007-07-19 Inventec Corporation Power-on error detection system and method
CN101539881A (en) * 2008-03-18 2009-09-23 环达电脑(上海)有限公司 Device and method for detecting memory errors
US20100017660A1 (en) * 2008-07-15 2010-01-21 Caterpillar Inc. System and method for protecting memory stacks using a debug unit
CN101741600A (en) * 2008-11-27 2010-06-16 英业达股份有限公司 Server system, recording equipment and management method thereof
CN101833492A (en) * 2010-04-15 2010-09-15 浪潮电子信息产业股份有限公司 Method for detecting memory failure
CN101908984A (en) * 2010-06-30 2010-12-08 杭州华三通信技术有限公司 Method and single board for detecting faults of memory
CN102222025A (en) * 2011-06-17 2011-10-19 华为数字技术有限公司 Method and device for eliminating memory failure

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1570859A (en) * 2003-07-16 2005-01-26 联想(北京)有限公司 Design method for avoiding misuse of non-ECC memory
CN1845029A (en) * 2005-11-11 2006-10-11 南京科远控制工程有限公司 Setting method for fault diagnosis and accident prediction
US20070168738A1 (en) * 2005-12-12 2007-07-19 Inventec Corporation Power-on error detection system and method
CN101539881A (en) * 2008-03-18 2009-09-23 环达电脑(上海)有限公司 Device and method for detecting memory errors
US20100017660A1 (en) * 2008-07-15 2010-01-21 Caterpillar Inc. System and method for protecting memory stacks using a debug unit
CN101741600A (en) * 2008-11-27 2010-06-16 英业达股份有限公司 Server system, recording equipment and management method thereof
CN101833492A (en) * 2010-04-15 2010-09-15 浪潮电子信息产业股份有限公司 Method for detecting memory failure
CN101908984A (en) * 2010-06-30 2010-12-08 杭州华三通信技术有限公司 Method and single board for detecting faults of memory
CN102222025A (en) * 2011-06-17 2011-10-19 华为数字技术有限公司 Method and device for eliminating memory failure

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019898A (en) * 2012-11-26 2013-04-03 加弘科技咨询(上海)有限公司 Error reporting system for memory module detection and slot position traffic light positioning
CN103019898B (en) * 2012-11-26 2017-02-08 加弘科技咨询(上海)有限公司 Error reporting system for memory module detection and slot position traffic light positioning
CN103077103A (en) * 2013-01-18 2013-05-01 浪潮电子信息产业股份有限公司 Off-line diagnosing method for server faults
CN103500133A (en) * 2013-09-17 2014-01-08 华为技术有限公司 Fault locating method and device
CN103995768A (en) * 2014-06-10 2014-08-20 浪潮电子信息产业股份有限公司 Visual quick diagnosing method of server faults
CN104021054A (en) * 2014-06-11 2014-09-03 浪潮(北京)电子信息产业有限公司 Server fault visual detecting and processing method and system and programmable chip
CN105159813A (en) * 2015-08-05 2015-12-16 北京百度网讯科技有限公司 Data center based fault alarming method, apparatus, management device and system
CN105159813B (en) * 2015-08-05 2018-09-14 北京百度网讯科技有限公司 Fault alarm method, device, management equipment based on data center and system
CN108959025A (en) * 2018-06-27 2018-12-07 郑州云海信息技术有限公司 A kind of server alarm method, device and server
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN110780646B (en) * 2019-09-21 2021-11-26 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system

Similar Documents

Publication Publication Date Title
CN102609350A (en) Server memory failure alarm method
CN109783262B (en) Fault data processing method, device, server and computer readable storage medium
CN104268061B (en) A kind of storage state monitoring method suitable for virtual machine
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN105045689A (en) Method for monitoring and alarming hard disks by using RAID card batch detection
CN104796273A (en) Method and device for diagnosing root of network faults
CN112395156A (en) Fault warning method and device, storage medium and electronic equipment
CN103401698A (en) Monitoring system used for alarming server status in server cluster operation
CN109491819A (en) A kind of method and system of diagnosis server failure
CN105607973B (en) Method, device and system for processing equipment fault in virtual machine system
CN111459782A (en) Method and device for monitoring business system, cloud platform system and server
CN102404141A (en) Method and device of alarm inhibition
JP2015109069A (en) Fault symptom notification apparatus, symptom notification method and symptom notification program
CN110763952A (en) Underground cable fault monitoring method and device
CN110784352B (en) Data synchronous monitoring and alarming method and device based on Oracle golden gate
CN105300447A (en) System and method for monitoring operation state of equipment
CN103605592A (en) Mechanism of detecting malfunctions of distributed computer system
CN103701657A (en) Device and method for monitoring and processing dysfunction of continuously running data processing system
CN112306871A (en) Data processing method, device, equipment and storage medium
JP2006268515A (en) Pci card trouble management system
CN103761157A (en) Method for implementing system fault-tolerant mechanism on basis of multitask patrol strategy
CN115766402B (en) Method and device for filtering server fault root cause, storage medium and electronic device
JP5803246B2 (en) Network operation management system, network monitoring server, network monitoring method and program
CN105955864A (en) Power supply fault processing method, power supply module, monitoring management module and server
CN103514086A (en) Extraction method and device for software error report

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120725