CN103092739A - Memory error checking and correcting (ECC) error reporting and alarm mechanism - Google Patents
Memory error checking and correcting (ECC) error reporting and alarm mechanism Download PDFInfo
- Publication number
- CN103092739A CN103092739A CN2013100188001A CN201310018800A CN103092739A CN 103092739 A CN103092739 A CN 103092739A CN 2013100188001 A CN2013100188001 A CN 2013100188001A CN 201310018800 A CN201310018800 A CN 201310018800A CN 103092739 A CN103092739 A CN 103092739A
- Authority
- CN
- China
- Prior art keywords
- error
- reports
- ecc
- reporting
- alarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
The invention provides a memory error checking and correcting (ECC) error reporting and alarm mechanism and belongs to the technology of computer. The memory ECC error reporting and alarm mechanism comprises an Intel Boxboro-EX platform server. The memory ECC error reporting and alarm mechanism comprises the following steps: triggering a ECC error correction mechanism when errors appear in a memory, when the server is in a high capacity operation, and setting a counter through a basic input\output system (BIOS) to record error reporting times in certain time to assess risk grades of system breakdowns in the error reporting: when the risk grades are low, error reporting information is recorded and alarm is not triggered; when the risk grades are high, error reporting information is recorded, alarm is triggered, and users are reminded of maintaining systems timely. Compared with the prior art, the memory ECC error reporting and alarm mechanism contributes to eliminating breakdowns timely and guarantee healthy status of the systems.
Description
Technical field
The present invention relates to field of computer technology, a kind of risk class assessment that internal memory is reported an error specifically, the internal memory ECC that the facilitates system maintenance alarm mechanism that reports an error.
Background technology
The existing alarm mechanism that internal memory ECC is reported an error is not distinguish the risk class that ECC reports an error, and reports an error as long as ECC occurs, and BMC is trigger alarm at once, can cause bad impression to the client under this situation, and increases the pressure of safeguarding of server.Sporadic reporting an error, internal memory self can be completed error correction, can ignore on the impact of whole system, and for reporting an error of this class, concerning whole system, risk class is extremely low, can trigger alarm; To the situation that ECC reports an error occurs in a large number within a period of time, may be that certain parts of system have operated in the excessive risk state, continuing operation may be larger on the Systems balanth impact, under this state, timely trigger alarm is necessary, help in time to fix a breakdown, guarantee the system health state.
Summary of the invention
Technical assignment of the present invention is to solve the deficiencies in the prior art, and a kind of internal memory ECC of the risk class assessment that internal memory the is reported an error alarm mechanism that reports an error is provided.
Technical scheme of the present invention realizes in the following manner, this a kind of internal memory ECC alarm mechanism that reports an error, comprise Intel Boxboro-EX Platform Server, its specific implementation step is: when server moves in high capacity, erroneous trigger ECC mechanism for correcting errors appears in internal memory, by BIOS, the number of times that reports an error in a counter records certain hour is set, and assesses the risk class of the system failure when reporting an error: during low risk level, record report an error information, not trigger alarm; During high-risk grade, when record reports an error information, trigger alarm, the timely maintenance system of reminding user.
The described detailed step of assessing system failure risk class when reporting an error by BIOS is: BIOS arranges the counter that reports an error, the threshold values N of the quantity that reports an error is set simultaneously, be recorded in a fixed time period T ECC number of times that reports an error, if the quantity that reports an error in time T n does not reach threshold values N, be n<N, BIOS notice BMC only records faithfully the information of reporting an error, not trigger alarm; If quantity n surpasses threshold values N if report an error in time T in the time, i.e. n 〉=N, BIOS can be sent to BMC with the information of reporting an error, and notice BMC is when record reports an error information, and trigger alarm reminding user system breaks down, so that the user in time safeguards.
When the quantity that reports an error in described period of time T n did not reach threshold values N, BIOS notice BMC recorded faithfully after the information of reporting an error counter O reset and restarts counting.
The beneficial effect that the present invention compared with prior art produces is:
A kind of internal memory ECC of the present invention risk class assessment of alarm mechanism by internal memory is reported an error that report an error, low-risk is reported an error only do monitoring and do not do warning, excessive risk is reported an error at monitoring while trigger alarm, the maintenance times of minimizing system, the prolongation system cycle of operation, help in time to fix a breakdown, guarantee the system health state.
Description of drawings
Accompanying drawing 1 is that ECC alarm mechanism of the present invention is realized block diagram.
Embodiment
Below in conjunction with accompanying drawing, a kind of internal memory ECC of the present invention alarm mechanism that reports an error is described in detail below.
As shown in Figure 1, a kind of internal memory ECC alarm mechanism that reports an error now is provided, comprise Intel Boxboro-EX Platform Server, its specific implementation step is: when server moves in high capacity, erroneous trigger ECC mechanism for correcting errors appears in internal memory, by BIOS, the number of times that reports an error in a counter records certain hour is set, and assesses the risk class of the system failure when reporting an error: during low risk level, record report an error information, not trigger alarm; During high-risk grade, when record reports an error information, trigger alarm, the timely maintenance system of reminding user.
the described detailed step of assessing system failure risk class when reporting an error by BIOS is: BIOS arranges the counter that reports an error, the threshold values N of the quantity that reports an error is set simultaneously, be recorded in a fixed time period T ECC number of times that reports an error, if the quantity that reports an error in time T n does not reach threshold values N, be n<N, this explanation ECC reports an error and just occurs once in a while, the fully capable error correction of internal memory, under this situation, system performance and system stability do not had impact substantially, risk class is extremely low, BIOS only can issue BMC with the ECC information of reporting an error, BIOS notice BMC only records faithfully the information of reporting an error, trigger alarm not, BIOS can and restart counting with counter O reset, if quantity n surpasses threshold values N if report an error in time T in the time, be n 〉=N, this explanation internal memory within a period of time frequently reports an error, internal memory can be completed error correction, but system performance is impacted, SDDC or DDDC have even set out, perhaps memory modules breaks down, under this situation, system has been operated in abnormality, perhaps system performance reduces, continuing operation may occur crashing or other unpredictalbe consequences, BIOS can be sent to BMC with the information of reporting an error, and notice BMC is when record reports an error information, trigger alarm reminding user system breaks down, so that the user in time safeguards.
Claims (3)
1. internal memory ECC alarm mechanism that reports an error, it is characterized in that: comprise Intel Boxboro-EX Platform Server, its specific implementation step is: when server moves in high capacity, erroneous trigger ECC mechanism for correcting errors appears in internal memory, by BIOS, the number of times that reports an error in a counter records certain hour is set, assess the risk class of the system failure when reporting an error: during low risk level, record report an error information, not trigger alarm; During high-risk grade, when record reports an error information, trigger alarm, the timely maintenance system of reminding user.
2. a kind of internal memory ECC according to claim 1 alarm mechanism that reports an error, it is characterized in that: the described detailed step of assessing system failure risk class when reporting an error by BIOS is: BIOS arranges the counter that reports an error, the threshold values N of the quantity that reports an error is set simultaneously, be recorded in a fixed time period T ECC number of times that reports an error, if the quantity that reports an error in time T n does not reach threshold values N, be n<N, BIOS notice BMC only records faithfully the information of reporting an error, not trigger alarm; If quantity n surpasses threshold values N if report an error in time T in the time, i.e. n 〉=N, BIOS can be sent to BMC with the information of reporting an error, and notice BMC is when record reports an error information, and trigger alarm reminding user system breaks down, so that the user in time safeguards.
3. a kind of internal memory ECC according to claim 2 alarm mechanism that reports an error is characterized in that: when the quantity that reports an error in described period of time T n did not reach threshold values N, BIOS notice BMC recorded faithfully after the information of reporting an error counter O reset and restarts counting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013100188001A CN103092739A (en) | 2013-01-18 | 2013-01-18 | Memory error checking and correcting (ECC) error reporting and alarm mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013100188001A CN103092739A (en) | 2013-01-18 | 2013-01-18 | Memory error checking and correcting (ECC) error reporting and alarm mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103092739A true CN103092739A (en) | 2013-05-08 |
Family
ID=48205341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013100188001A Pending CN103092739A (en) | 2013-01-18 | 2013-01-18 | Memory error checking and correcting (ECC) error reporting and alarm mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103092739A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605597A (en) * | 2013-11-20 | 2014-02-26 | 中国科学院数据与通信保护研究教育中心 | Configurable computer protection system and method |
CN104268052A (en) * | 2014-10-21 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Memory Rank Spare testing method based on ITP tool |
CN104317690A (en) * | 2014-10-21 | 2015-01-28 | 浪潮电子信息产业股份有限公司 | Memory Demand Scrub testing method based on ITP (integration test platform) tool |
CN105117301A (en) * | 2015-08-14 | 2015-12-02 | 杭州华为数字技术有限公司 | Memory warning method and apparatus |
CN105426288A (en) * | 2015-11-10 | 2016-03-23 | 浪潮电子信息产业股份有限公司 | Optimization method of memory alarm |
CN105589789A (en) * | 2015-12-25 | 2016-05-18 | 浪潮电子信息产业股份有限公司 | Method for dynamically adjusting memory monitoring threshold value |
CN107608634A (en) * | 2017-09-25 | 2018-01-19 | 四川长虹电器股份有限公司 | Android system spatial processing method |
CN109101377A (en) * | 2018-07-18 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of test method of memory SDDC |
CN109213038A (en) * | 2018-08-30 | 2019-01-15 | 武汉携康智能健康设备有限公司 | A kind of archives intelligent and safe management system |
WO2019052208A1 (en) * | 2017-09-18 | 2019-03-21 | 华为技术有限公司 | Method and apparatus for memory evaluation |
CN110008056A (en) * | 2019-03-28 | 2019-07-12 | 联想(北京)有限公司 | EMS memory management process, device, electronic equipment and computer readable storage medium |
CN110008090A (en) * | 2019-04-15 | 2019-07-12 | 苏州浪潮智能科技有限公司 | A kind of method, apparatus and computer readable storage medium monitoring EMS memory error |
CN110032869A (en) * | 2019-04-19 | 2019-07-19 | 湖南科技学院 | A kind of cloud computing protection early warning system based on big data |
CN110308362A (en) * | 2019-04-16 | 2019-10-08 | 惠科股份有限公司 | Detection circuit and display panel |
CN110780646A (en) * | 2019-09-21 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Memory quality early warning method based on MES system |
CN110781027A (en) * | 2019-10-29 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Method, device and equipment for determining error reporting threshold of memory ECC (error correction code) |
CN111209129A (en) * | 2019-12-27 | 2020-05-29 | 曙光信息产业股份有限公司 | Memory optimization method and device based on AMD platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4873687A (en) * | 1987-10-05 | 1989-10-10 | Ibm Corporation | Failing resource manager in a multiplex communication system |
CN1570859A (en) * | 2003-07-16 | 2005-01-26 | 联想(北京)有限公司 | Design method for avoiding misuse of non-ECC memory |
CN101539881A (en) * | 2008-03-18 | 2009-09-23 | 环达电脑(上海)有限公司 | Device and method for detecting memory errors |
CN102135925A (en) * | 2010-12-27 | 2011-07-27 | 西安锐信科技有限公司 | Method and device for detecting error check and correcting memory |
-
2013
- 2013-01-18 CN CN2013100188001A patent/CN103092739A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4873687A (en) * | 1987-10-05 | 1989-10-10 | Ibm Corporation | Failing resource manager in a multiplex communication system |
CN1570859A (en) * | 2003-07-16 | 2005-01-26 | 联想(北京)有限公司 | Design method for avoiding misuse of non-ECC memory |
CN101539881A (en) * | 2008-03-18 | 2009-09-23 | 环达电脑(上海)有限公司 | Device and method for detecting memory errors |
CN102135925A (en) * | 2010-12-27 | 2011-07-27 | 西安锐信科技有限公司 | Method and device for detecting error check and correcting memory |
Non-Patent Citations (1)
Title |
---|
田昕: "计算机硬件设备故障管理机制研究", 《中国优秀硕士学位论文全文库》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605597A (en) * | 2013-11-20 | 2014-02-26 | 中国科学院数据与通信保护研究教育中心 | Configurable computer protection system and method |
CN103605597B (en) * | 2013-11-20 | 2017-01-18 | 中国科学院数据与通信保护研究教育中心 | Configurable computer protection system and method |
CN104268052A (en) * | 2014-10-21 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Memory Rank Spare testing method based on ITP tool |
CN104317690A (en) * | 2014-10-21 | 2015-01-28 | 浪潮电子信息产业股份有限公司 | Memory Demand Scrub testing method based on ITP (integration test platform) tool |
CN104317690B (en) * | 2014-10-21 | 2016-01-27 | 浪潮电子信息产业股份有限公司 | A kind of Memory Demand Scrub method of testing based on ITP instrument |
CN104268052B (en) * | 2014-10-21 | 2016-02-03 | 浪潮电子信息产业股份有限公司 | A kind of Memory Rank Spare method of testing based on ITP instrument |
CN105117301B (en) * | 2015-08-14 | 2018-08-14 | 杭州华为数字技术有限公司 | A kind of method and device of memory early warning |
CN105117301A (en) * | 2015-08-14 | 2015-12-02 | 杭州华为数字技术有限公司 | Memory warning method and apparatus |
CN105426288A (en) * | 2015-11-10 | 2016-03-23 | 浪潮电子信息产业股份有限公司 | Optimization method of memory alarm |
CN105589789A (en) * | 2015-12-25 | 2016-05-18 | 浪潮电子信息产业股份有限公司 | Method for dynamically adjusting memory monitoring threshold value |
US11354183B2 (en) | 2017-09-18 | 2022-06-07 | Huawei Technologies Co., Ltd. | Memory evaluation method and apparatus |
WO2019052208A1 (en) * | 2017-09-18 | 2019-03-21 | 华为技术有限公司 | Method and apparatus for memory evaluation |
CN109522175A (en) * | 2017-09-18 | 2019-03-26 | 华为技术有限公司 | A kind of method and device of memory assessment |
US11868201B2 (en) | 2017-09-18 | 2024-01-09 | Huawei Technologies Co., Ltd. | Memory evaluation method and apparatus |
CN107608634A (en) * | 2017-09-25 | 2018-01-19 | 四川长虹电器股份有限公司 | Android system spatial processing method |
CN109101377A (en) * | 2018-07-18 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of test method of memory SDDC |
CN109213038A (en) * | 2018-08-30 | 2019-01-15 | 武汉携康智能健康设备有限公司 | A kind of archives intelligent and safe management system |
CN110008056A (en) * | 2019-03-28 | 2019-07-12 | 联想(北京)有限公司 | EMS memory management process, device, electronic equipment and computer readable storage medium |
CN110008090A (en) * | 2019-04-15 | 2019-07-12 | 苏州浪潮智能科技有限公司 | A kind of method, apparatus and computer readable storage medium monitoring EMS memory error |
CN110308362A (en) * | 2019-04-16 | 2019-10-08 | 惠科股份有限公司 | Detection circuit and display panel |
CN110032869A (en) * | 2019-04-19 | 2019-07-19 | 湖南科技学院 | A kind of cloud computing protection early warning system based on big data |
CN110032869B (en) * | 2019-04-19 | 2022-08-09 | 湖南科技学院 | Cloud computing protection early warning system based on big data |
CN110780646A (en) * | 2019-09-21 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Memory quality early warning method based on MES system |
CN110781027A (en) * | 2019-10-29 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Method, device and equipment for determining error reporting threshold of memory ECC (error correction code) |
CN110781027B (en) * | 2019-10-29 | 2023-01-10 | 苏州浪潮智能科技有限公司 | Method, device and equipment for determining error reporting threshold of memory ECC (error correction code) |
CN111209129A (en) * | 2019-12-27 | 2020-05-29 | 曙光信息产业股份有限公司 | Memory optimization method and device based on AMD platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103092739A (en) | Memory error checking and correcting (ECC) error reporting and alarm mechanism | |
US10459815B2 (en) | Method and system for predicting storage device failures | |
US11119874B2 (en) | Memory fault detection | |
CN105659215B (en) | A kind of fault handling method, relevant apparatus and computer | |
CN105843699B (en) | Dynamic random access memory device and method for error monitoring and correction | |
CN105117301B (en) | A kind of method and device of memory early warning | |
CN106682162B (en) | Log management method and device | |
TWI610169B (en) | Method and processor for writing, and error tracking a log subsystem of a file system | |
CN205881469U (en) | Fault detection equipment of electronic equipment and memory that is used for having a plurality of memory locations of standing transient fault and permanent fault | |
CN107145410A (en) | After a kind of system exception power down it is automatic on establish the method, system and equipment of machine by cable | |
CN109710501B (en) | Method and system for detecting data transmission stability of server | |
CN102591591A (en) | Disk detection system, disk detection method and network storage system | |
CN104424078A (en) | Method and system for reducing floating alarm information | |
CN109491819A (en) | A kind of method and system of diagnosis server failure | |
CN109002384A (en) | A kind of alarm method of server failure, device, equipment and storage medium | |
CN105426288A (en) | Optimization method of memory alarm | |
CN105589789A (en) | Method for dynamically adjusting memory monitoring threshold value | |
CN102609350A (en) | Server memory failure alarm method | |
US7269764B2 (en) | Monitoring VRM-induced memory errors | |
WO2020000956A1 (en) | Method, apparatus and device for bmc monitoring of correctable ecc errors | |
CN105247490B (en) | For the optimization method used of the nonvolatile memory of the motor vehicles computer of building blocks of function monitoring | |
CN109389697A (en) | Recording method, equipment and the readable storage medium storing program for executing of underground inspection data inputting time | |
CN108204331B (en) | Fault processing method and device for wind generating set | |
CN106201753B (en) | Method and system for processing PCIE errors in linux | |
CN103916272A (en) | Main control single board and fault detecting method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130508 |
|
WD01 | Invention patent application deemed withdrawn after publication |