CN105117301A - Memory warning method and apparatus - Google Patents

Memory warning method and apparatus Download PDF

Info

Publication number
CN105117301A
CN105117301A CN201510500335.4A CN201510500335A CN105117301A CN 105117301 A CN105117301 A CN 105117301A CN 201510500335 A CN201510500335 A CN 201510500335A CN 105117301 A CN105117301 A CN 105117301A
Authority
CN
China
Prior art keywords
warning
funnel
threshold value
frequency
modes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510500335.4A
Other languages
Chinese (zh)
Other versions
CN105117301B (en
Inventor
宋刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
Hangzhou Huawei Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huawei Digital Technologies Co Ltd filed Critical Hangzhou Huawei Digital Technologies Co Ltd
Priority to CN201510500335.4A priority Critical patent/CN105117301B/en
Publication of CN105117301A publication Critical patent/CN105117301A/en
Application granted granted Critical
Publication of CN105117301B publication Critical patent/CN105117301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Filling Or Emptying Of Bunkers, Hoppers, And Tanks (AREA)

Abstract

The present invention discloses a memory warning method and apparatus, and belongs to the field of server technologies. The method comprises: obtaining a warning mode currently set, and obtaining funnel frequency and a warning threshold corresponding to the warning mode, wherein the funnel frequency is the preset number of subtraction operations performed in a funnel counter in each second; recording the number of occurrences of correctable errors of a currently monitored memory in the funnel counter, and performing subtraction operations on the funnel counter according to the funnel frequency; and when it is detected that the recorded number of occurrences in the funnel counter reaches the warning threshold and all repair methods have been used, sending a warning to a baseboard management controller (BMC). The method and apparatus provided by the present invention realize the warning method of correctable errors in the memory at the time dimension, thereby improving accuracy and effectiveness of warning.

Description

A kind of method of internal memory early warning and device
Technical field
The present invention relates to server technology field, the method for particularly a kind of internal memory early warning and device.
Background technology
Along with the development of server technology, the memory size configured in server is increasing, the speed goes that internal memory runs is high, these Large Copacity tell that the internal memory of operation becomes the fault area occurred frequently of most influential system stability, how to judge before internal memory generation catastrophic failure prerequisite and to dispose the important demand and technological difficulties that become system stability reliability.
At present, be provided with hopper count device in the server, this hopper count device can record the number of times of the correctable error that every bar internal memory occurs.BMC meeting automatic regular polling hopper count device, when the number of times monitoring correctable error reaches predetermined threshold value, then can trigger early warning, carry out fault handling with prompt server managerial personnel.
Inventor finds that prior art at least exists following problem:
Some memory failure belongs to soft fault (such as: the bit reversal etc. that cosmic rays causes), and this kind of fault can recover normal within a certain period of time, not correctable error.Thisly carry out early warning in the mode of pure counting, do not consider the factor of time dimension, cause early warning to report by mistake, and then cause the operation maintenance inefficiency of server.
Summary of the invention
In order to solve the problems of the prior art, embodiments provide a kind of method and device of internal memory early warning.Described technical scheme is as follows:
First aspect, the method for a kind of internal memory early warning that the embodiment of the present invention provides, described method comprises:
Obtain the modes of warning of current setting, and obtain funnel frequency corresponding to described modes of warning and threshold value of warning, described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
The number of times that correctable error occurs in the internal memory of current monitor is recorded in described hopper count device, and according to described funnel frequency, subtraction operation is carried out to described hopper count device;
When monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, send early warning to baseboard management controller BMC.
In the first possible implementation of first aspect, the threshold value of warning that the described modes of warning of described acquisition is corresponding, comprising:
Obtain duration parameters corresponding to modes of warning and multiple parameter;
According to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
In conjunction with the first possible implementation of first aspect, in the implementation that the second of first aspect is possible, after the threshold value of warning that the described modes of warning of described calculating is corresponding, described method also comprises:
If the described threshold value of warning calculated has exceeded higher limit, then described threshold value of warning is set to described higher limit.
In conjunction with the first possible implementation of first aspect, in the third possible implementation of first aspect, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
In conjunction with the first possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
In the 5th kind of possible implementation of first aspect, described when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, send early warning to BMC, comprising:
When monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
When described SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
Second aspect, the device of a kind of internal memory early warning that the embodiment of the present invention provides, described device comprises:
Acquisition module, for obtaining the modes of warning of current setting, and obtains funnel frequency corresponding to described modes of warning and threshold value of warning, and described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
Processing module, for being recorded in described hopper count device by the number of times that correctable error occurs in the internal memory of current monitor, and carries out subtraction operation according to described funnel frequency to described hopper count device;
Warning module, for when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, sends early warning to baseboard management controller BMC.
In the first possible implementation of second aspect, described acquisition module, comprising:
Acquiring unit, for obtaining duration parameters corresponding to modes of warning and multiple parameter;
Computing unit, for according to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
In conjunction with the first possible implementation of second aspect, in the implementation that the second of second aspect is possible, described acquisition module also comprises:
Setting unit, if exceeded higher limit for the described threshold value of warning calculated, has then been set to described higher limit by described threshold value of warning.
In conjunction with the first possible implementation of second aspect, in the third possible implementation of second aspect, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
In conjunction with the first possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
In the 5th kind of possible implementation of second aspect, described warning module, comprising:
Interrupt location, for when monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
Transmitting element, for when described SMI terminal hander detects that each repair mode uses, sends pre-alert notification message to BMC.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
By obtaining funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the method flow diagram of the internal memory early warning that the embodiment of the present invention 1 provides;
Fig. 2 is the schematic diagram of memory failure early warning mathematical model in the method for the internal memory early warning that the embodiment of the present invention 1 provides;
Fig. 3 is the method flow diagram of the internal memory early warning that the embodiment of the present invention 2 provides;
Fig. 4 is the apparatus structure schematic diagram of the internal memory early warning that the embodiment of the present invention 3 provides;
Fig. 5 is the structural representation of the server that the embodiment of the present invention 4 provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment 1
Embodiments provide a kind of method of internal memory early warning, see Fig. 1.
Wherein, the method comprises:
101: the modes of warning obtaining current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
102: the number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device;
103: when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC (BaseboardManagementController, baseboard management controller).
As shown in Figure 2, this there is shown memory failure early warning mathematical model, and wherein in this model, transverse axis represents the frequency that not correctable error occurs, and the longitudinal axis represents the possibility of system generation catastrophic failure under this ECC frequency; Can carry out qualitative analysis to memory failure by this model obtains as drawn a conclusion:
The possibility of frequency more Iarge-scale system generation catastrophic failure that occurs of correctable error is namely not larger;
What the funnel frequency threshold of the frequency of correctable error generation did not select is larger, and the accuracy of namely early warning is larger; The coverage rate of early warning is lower.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 2
Embodiments provide a kind of method of internal memory early warning, see Fig. 3.
Wherein, the method comprises:
301: the modes of warning obtaining current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device.
Wherein, modes of warning comprises: accuracy pattern and coverage rate pattern.
The EMS memory error occurred under accuracy pattern needs to possess following characteristics: the frequency that EMS memory error occurs is high and can continue for some time; The EMS memory error occurred under coverage rate pattern needs to possess following characteristics: the funnel frequency of setting is lower than certain threshold value.
Include the parameter that can set in often kind of modes of warning, can comprise: funnel frequency, threshold value of warning.Optionally, threshold value of warning can calculate for other parameters in modes of warning.Accordingly, the calculating parameter that threshold value of warning is corresponding can comprise: duration parameters and multiple parameter.Duration parameters is the duration that internal memory makes a mistake, and the duration is too short, can not be confirmed whether to break down, longer then more accurate.
Therefore, can just like drawing a conclusion after compared with between accuracy pattern with coverage rate pattern:
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Wherein, accuracy requirement is higher, then funnel frequency is larger.
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Wherein, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Because the frequency of the correctable error of reality generation is greater than funnel frequency, so consider the situation of the frequency of higher generation correctable error, threshold value of warning needs to amplify 10 ~ 100 times by multiple parameter, and wherein accuracy requirement is higher, then multiple is larger.
Server admin personnel for the characteristic of often kind of modes of warning, can be arranged in server admin, select the modes of warning meeting current needs.
Wherein, in the hopper count mechanism of the current correctable error to internal memory, also have this parameter that can arrange of funnel frequency.After being provided with this funnel frequency, hopper count device while the number of times of internal memory generation correctable error recording each monitoring, also can carry out subtraction operation according to this funnel frequency to hopper count device, to reduce the number of times recorded in hopper count device.
The reason that internal memory breaks down is broadly divided into soft fault (as the bit reversal that cosmic rays causes), transient fault (crosstalk as data line), hard fault (damage or the inefficacy of certain bit as memory grain).
Soft fault may cause more internal memory to report an error and non-standing instantaneous, and this kind of EMS memory error can be repaired very soon, then can not again produce this mistake after a certain time.But the mistake caused due to soft fault can be recorded in hopper count device, the number of times of these records, once reach the threshold value of specifying, can trigger early warning, cause early warning to report by mistake.And by using funnel frequency to carry out subtraction operation to hopper count device, the number of times that reports an error of the soft fault recorded in hopper count device can be made on time dimension to be reduced after some period of time, then can not trigger early warning.
In the disclosed embodiments, step 301 can be realized by following steps:
3011: the modes of warning obtaining current setting;
3012: obtain the funnel frequency that modes of warning is corresponding;
3013: obtain duration parameters corresponding to modes of warning and multiple parameter;
3014: according to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Wherein, the computing formula of threshold value of warning can be:
Threshold value of warning=duration parameters * funnel frequency * multiple parameter;
Such as: can arrange: funnel frequency=10/second; Duration parameters=50 second; Multiple parameter is 10, then threshold value of warning is 5000.
In addition, less demanding to accuracy and require the extreme case failed to report less:
Funnel frequency=1/second; Maximum length in time=3 second; Multiple parameter is 10, then threshold value of warning is 30;
General real system higher but also consider the practical balance failed to report less to accuracy requirement simultaneously, therefore can arrange in the following manner:
Funnel frequency=1/second; Maximum length in time=60 second; Multiple parameter is 100, then threshold value of warning is 6000.
Further, the threshold value of warning that calculates may be made excessive to the exigent extreme case of accuracy, therefore need to limit the upper limit of threshold value of warning, if detect that threshold value of warning is greater than higher limit, then need to be re-set as higher limit to threshold value of warning, when this situation is detected, need to perform step 3015.
3015: if the threshold value of warning calculated has exceeded higher limit, then threshold value of warning is set to higher limit.
Such as: to the exigent extreme case of accuracy:
Funnel frequency=100/second, maximum length in time=60 second, multiple parameter is 100, then threshold value of warning is 600000, and wherein higher limit is 32767, therefore threshold value of warning is set to 32767.
302: the number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device.
Such as: the frequency of the number of times of correctable error is 100/second, and funnel frequency is 50/second, then per secondly in hopper count device 50 numerals are increased progressively.
303: when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC.
Wherein, when the number of times recorded in hopper count device reaches threshold value of warning, RAS (Reliability can be started, Availability, Serviceability, reliability, availability, serviceability) internal memory reparation operation, RAS internal memory reparation operation can solve memory grain failure of removal, but the corresponding cost that can pay.
When the number of times recorded in hopper count device reaches threshold value of warning and various repair mode all uses, need to perform DeviceTagging processing capacity, DeviceTagging processing capacity is that various repair function is all finished the processing capacity that the number of times that records in rear last hopper count device just can use above, but it can lose the correction capability of single-bit error.
Perform DeviceTagging as crossed, then represent that RAS process action is used up all, SMI interrupt handling routine sends pre-alarm and informs BMC.
Wherein, the mode sending early warning can realize for following mode:
3031: when monitoring the number of times recorded in described hopper count device and reaching threshold value of warning, trigger SMI (SystemManagementInterrupt, system management interrupt) and interrupt;
3032: when SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
From 3 aspects, the method for the internal memory early warning utilizing the embodiment of the present invention to provide, can ensure that the early warning that it exports is accurately and reliably:
A) high-frequency correctable error can improve the possibility that not correctable error occurs greatly, ought to early warning;
B) high-frequency correctable error causes error correction frequently can have a strong impact on the performance of system, ought to early warning;
C) (whether enables users is optional for DeviceTagging, defaultly to open) solve individual particle failure of removal due to DeviceTagging can be triggered after exceeding threshold value, but also result in this RANK internal memory simultaneously and lose single-bit error correction capability, be in dangerous edge, ought to early warning.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 3
Embodiments provide a kind of device of internal memory early warning, see Fig. 4.This device comprises:
Acquisition module 401, for obtaining the modes of warning of current setting, and obtains funnel frequency corresponding to modes of warning and threshold value of warning, and funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
Processing module 402, for being recorded in hopper count device by the number of times that correctable error occurs in the internal memory of current monitor, and carries out subtraction operation according to funnel frequency to hopper count device;
Warning module 403, for when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, sends early warning to baseboard management controller BMC.
Optionally, acquisition module 401, comprising:
Acquiring unit, for obtaining duration parameters corresponding to modes of warning and multiple parameter;
Computing unit, for according to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Optionally, acquisition module 401 also comprises:
Setting unit, if exceeded higher limit for the threshold value of warning calculated, has then been set to higher limit by threshold value of warning.
Optionally, modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Optionally, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Optionally, warning module 403, comprising:
Interrupt location, for when monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
Transmitting element, for when described SMI terminal hander detects that each repair mode uses, sends pre-alert notification message to BMC.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 4
Embodiments provide a kind of server,
Its structure is see Fig. 5, and wherein, this server comprises: storer 501 and at least one processor 502, and processor 502 is configured to perform following operation:
Obtain the modes of warning of current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
The number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device;
When monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to baseboard management controller BMC.
Wherein, obtain the threshold value of warning that modes of warning is corresponding, comprising:
Obtain duration parameters corresponding to modes of warning and multiple parameter;
According to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Wherein, after calculating threshold value of warning corresponding to modes of warning, method also comprises:
If the threshold value of warning calculated has exceeded higher limit, then threshold value of warning is set to higher limit.
Wherein, modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Wherein, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Wherein, when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC, comprising:
When monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
When described SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1. a method for internal memory early warning, is characterized in that, described method comprises:
Obtain the modes of warning of current setting, and obtain funnel frequency corresponding to described modes of warning and threshold value of warning, described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
The number of times that correctable error occurs in the internal memory of current monitor is recorded in described hopper count device, and according to described funnel frequency, subtraction operation is carried out to described hopper count device;
When monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, send early warning to baseboard management controller BMC.
2. method according to claim 1, is characterized in that, the threshold value of warning that the described modes of warning of described acquisition is corresponding, comprising:
Obtain duration parameters corresponding to modes of warning and multiple parameter;
According to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
3. method according to claim 2, is characterized in that, after the threshold value of warning that the described modes of warning of described calculating is corresponding, described method also comprises:
If the described threshold value of warning calculated has exceeded higher limit, then described threshold value of warning is set to described higher limit.
4. method according to claim 2, is characterized in that, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
5. method according to claim 2, is characterized in that, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
6. method according to claim 1, is characterized in that, described when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, sends early warning, comprising to BMC:
When monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
When described SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
7. a device for internal memory early warning, is characterized in that, described device comprises:
Acquisition module, for obtaining the modes of warning of current setting, and obtains funnel frequency corresponding to described modes of warning and threshold value of warning, and described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
Processing module, for being recorded in described hopper count device by the number of times that correctable error occurs in the internal memory of current monitor, and carries out subtraction operation according to described funnel frequency to described hopper count device;
Warning module, for when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, sends early warning to baseboard management controller BMC.
8. device according to claim 7, is characterized in that, described acquisition module, comprising:
Acquiring unit, for obtaining duration parameters corresponding to modes of warning and multiple parameter;
Computing unit, for according to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
9. device according to claim 8, is characterized in that, described acquisition module also comprises:
Setting unit, if exceeded higher limit for the described threshold value of warning calculated, has then been set to described higher limit by described threshold value of warning.
10. device according to claim 8, is characterized in that, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
11. devices according to claim 8, is characterized in that, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
12. devices according to claim 7, is characterized in that, described warning module, comprising:
Interrupt location, for when monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
Transmitting element, for when described SMI terminal hander detects that each repair mode uses, sends pre-alert notification message to BMC.
CN201510500335.4A 2015-08-14 2015-08-14 A kind of method and device of memory early warning Active CN105117301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510500335.4A CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510500335.4A CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Publications (2)

Publication Number Publication Date
CN105117301A true CN105117301A (en) 2015-12-02
CN105117301B CN105117301B (en) 2018-08-14

Family

ID=54665301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510500335.4A Active CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Country Status (1)

Country Link
CN (1) CN105117301B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445720A (en) * 2016-10-11 2017-02-22 郑州云海信息技术有限公司 Memory error recovery method and device
CN107590047A (en) * 2016-07-08 2018-01-16 佛山市顺德区顺达电脑厂有限公司 The monitoring system and method for smi signal overtime
WO2019061517A1 (en) * 2017-09-30 2019-04-04 华为技术有限公司 Memory fault detection method and device, and server
CN109992477A (en) * 2019-03-27 2019-07-09 联想(北京)有限公司 Information processing method, system and electronic equipment for electronic equipment
US10430260B2 (en) 2016-12-05 2019-10-01 Huawei Technologies Co., Ltd. Troubleshooting method, computer system, baseboard management controller, and system
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN111008091A (en) * 2019-12-06 2020-04-14 苏州浪潮智能科技有限公司 Fault processing method, system and related device for memory CE
CN111459557A (en) * 2020-03-12 2020-07-28 烽火通信科技股份有限公司 Method and system for shortening starting time of server
CN113407391A (en) * 2016-12-05 2021-09-17 华为技术有限公司 Fault processing method, computer system, substrate management controller and system
CN115543677A (en) * 2022-11-29 2022-12-30 苏州浪潮智能科技有限公司 Correctable error processing method, device and equipment and readable storage medium
WO2023044832A1 (en) * 2021-09-25 2023-03-30 Intel Corporation Apparatus, computer-readable medium, and method for increasing memory error handling accuracy

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008090B (en) * 2019-04-15 2020-10-02 苏州浪潮智能科技有限公司 Method and device for monitoring memory errors and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681909A (en) * 2012-04-28 2012-09-19 浪潮电子信息产业股份有限公司 Server early-warning method based on memory errors
JP5078582B2 (en) * 2007-12-10 2012-11-21 オムロンオートモーティブエレクトロニクス株式会社 Motor control device
CN103092739A (en) * 2013-01-18 2013-05-08 浪潮电子信息产业股份有限公司 Memory error checking and correcting (ECC) error reporting and alarm mechanism
CN103605602A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Method for filtering out malfunctions of distributed computer system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5078582B2 (en) * 2007-12-10 2012-11-21 オムロンオートモーティブエレクトロニクス株式会社 Motor control device
CN102681909A (en) * 2012-04-28 2012-09-19 浪潮电子信息产业股份有限公司 Server early-warning method based on memory errors
CN103092739A (en) * 2013-01-18 2013-05-08 浪潮电子信息产业股份有限公司 Memory error checking and correcting (ECC) error reporting and alarm mechanism
CN103605602A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Method for filtering out malfunctions of distributed computer system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590047B (en) * 2016-07-08 2021-02-12 佛山市顺德区顺达电脑厂有限公司 SMI signal timeout monitoring system and method
CN107590047A (en) * 2016-07-08 2018-01-16 佛山市顺德区顺达电脑厂有限公司 The monitoring system and method for smi signal overtime
CN106445720A (en) * 2016-10-11 2017-02-22 郑州云海信息技术有限公司 Memory error recovery method and device
US10430260B2 (en) 2016-12-05 2019-10-01 Huawei Technologies Co., Ltd. Troubleshooting method, computer system, baseboard management controller, and system
CN113407391A (en) * 2016-12-05 2021-09-17 华为技术有限公司 Fault processing method, computer system, substrate management controller and system
WO2019061517A1 (en) * 2017-09-30 2019-04-04 华为技术有限公司 Memory fault detection method and device, and server
US11119874B2 (en) 2017-09-30 2021-09-14 Huawei Technologies Co., Ltd. Memory fault detection
CN109992477A (en) * 2019-03-27 2019-07-09 联想(北京)有限公司 Information processing method, system and electronic equipment for electronic equipment
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN111008091A (en) * 2019-12-06 2020-04-14 苏州浪潮智能科技有限公司 Fault processing method, system and related device for memory CE
CN111459557A (en) * 2020-03-12 2020-07-28 烽火通信科技股份有限公司 Method and system for shortening starting time of server
WO2023044832A1 (en) * 2021-09-25 2023-03-30 Intel Corporation Apparatus, computer-readable medium, and method for increasing memory error handling accuracy
CN115543677A (en) * 2022-11-29 2022-12-30 苏州浪潮智能科技有限公司 Correctable error processing method, device and equipment and readable storage medium

Also Published As

Publication number Publication date
CN105117301B (en) 2018-08-14

Similar Documents

Publication Publication Date Title
CN105117301A (en) Memory warning method and apparatus
US10430260B2 (en) Troubleshooting method, computer system, baseboard management controller, and system
CN109328340B (en) Memory fault detection method and device and server
CN109783262B (en) Fault data processing method, device, server and computer readable storage medium
EP3979079A1 (en) Memory fault handling method and apparatus, device and storage medium
CN105335262A (en) Automatic calculation and early warning method for failures of components of bulk servers
CN107145410A (en) After a kind of system exception power down it is automatic on establish the method, system and equipment of machine by cable
CN110727533A (en) Alarm method, device, equipment and medium
WO2023179684A1 (en) Method and apparatus for monitoring state of central processing unit, and device and storage medium
CN106201753B (en) Method and system for processing PCIE errors in linux
CN115981898A (en) Error-correctable error processing method, device and equipment for memory and readable storage medium
CN115794588A (en) Memory fault prediction method, device and system and monitoring server
CN108899059B (en) Detection method and equipment for solid state disk
JP5618204B2 (en) Fault processing apparatus, information processing apparatus using the same, and fault processing method for information processing apparatus
EP3358467A1 (en) Fault processing method, computer system, baseboard management controller and system
CN110489260A (en) Fault recognition method, device and BMC
CN110687851A (en) Terminal operation monitoring system and method
CN113590427B (en) Alarm method, device, storage medium and equipment for monitoring index abnormality
US7664797B1 (en) Method and apparatus for using statistical process control within a storage management system
CN111880992B (en) Monitoring and maintaining method for controller state in storage device
CN111586129A (en) Alarm method and device for data synchronization, electronic equipment and storage medium
WO2022057373A1 (en) Dual-port disk management method, apparatus and terminal, and storage medium
CN117076186B (en) Memory fault detection method, system, device, medium and server
CN111127855A (en) Environmental event monitoring method and system of environmental Internet of things
CN110795263B (en) Hard disk link protection method and related device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200420

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Patentee before: Huawei Technologies Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211223

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee after: Super fusion Digital Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right