Summary of the invention
In order to solve the problems of the prior art, embodiments provide a kind of method and device of internal memory early warning.Described technical scheme is as follows:
First aspect, the method for a kind of internal memory early warning that the embodiment of the present invention provides, described method comprises:
Obtain the modes of warning of current setting, and obtain funnel frequency corresponding to described modes of warning and threshold value of warning, described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
The number of times that correctable error occurs in the internal memory of current monitor is recorded in described hopper count device, and according to described funnel frequency, subtraction operation is carried out to described hopper count device;
When monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, send early warning to baseboard management controller BMC.
In the first possible implementation of first aspect, the threshold value of warning that the described modes of warning of described acquisition is corresponding, comprising:
Obtain duration parameters corresponding to modes of warning and multiple parameter;
According to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
In conjunction with the first possible implementation of first aspect, in the implementation that the second of first aspect is possible, after the threshold value of warning that the described modes of warning of described calculating is corresponding, described method also comprises:
If the described threshold value of warning calculated has exceeded higher limit, then described threshold value of warning is set to described higher limit.
In conjunction with the first possible implementation of first aspect, in the third possible implementation of first aspect, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
In conjunction with the first possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
In the 5th kind of possible implementation of first aspect, described when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, send early warning to BMC, comprising:
When monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
When described SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
Second aspect, the device of a kind of internal memory early warning that the embodiment of the present invention provides, described device comprises:
Acquisition module, for obtaining the modes of warning of current setting, and obtains funnel frequency corresponding to described modes of warning and threshold value of warning, and described funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
Processing module, for being recorded in described hopper count device by the number of times that correctable error occurs in the internal memory of current monitor, and carries out subtraction operation according to described funnel frequency to described hopper count device;
Warning module, for when monitoring the number of times recorded in described hopper count device and reach described threshold value of warning and each repair mode using, sends early warning to baseboard management controller BMC.
In the first possible implementation of second aspect, described acquisition module, comprising:
Acquiring unit, for obtaining duration parameters corresponding to modes of warning and multiple parameter;
Computing unit, for according to described duration parameters, described multiple parameter and described funnel parameters, calculate the threshold value of warning that described modes of warning is corresponding.
In conjunction with the first possible implementation of second aspect, in the implementation that the second of second aspect is possible, described acquisition module also comprises:
Setting unit, if exceeded higher limit for the described threshold value of warning calculated, has then been set to described higher limit by described threshold value of warning.
In conjunction with the first possible implementation of second aspect, in the third possible implementation of second aspect, described modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to described accuracy pattern is greater than funnel frequency corresponding to described coverage rate pattern;
Duration parameters corresponding to described accuracy pattern is greater than duration parameters corresponding to described coverage rate pattern.
In conjunction with the first possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described funnel frequency is between 1/second to 100/second; Described duration parameters is between 3 seconds to 60 seconds; Described multiple parameter is between 10 to 100.
In the 5th kind of possible implementation of second aspect, described warning module, comprising:
Interrupt location, for when monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
Transmitting element, for when described SMI terminal hander detects that each repair mode uses, sends pre-alert notification message to BMC.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
By obtaining funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment 1
Embodiments provide a kind of method of internal memory early warning, see Fig. 1.
Wherein, the method comprises:
101: the modes of warning obtaining current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
102: the number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device;
103: when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC (BaseboardManagementController, baseboard management controller).
As shown in Figure 2, this there is shown memory failure early warning mathematical model, and wherein in this model, transverse axis represents the frequency that not correctable error occurs, and the longitudinal axis represents the possibility of system generation catastrophic failure under this ECC frequency; Can carry out qualitative analysis to memory failure by this model obtains as drawn a conclusion:
The possibility of frequency more Iarge-scale system generation catastrophic failure that occurs of correctable error is namely not larger;
What the funnel frequency threshold of the frequency of correctable error generation did not select is larger, and the accuracy of namely early warning is larger; The coverage rate of early warning is lower.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 2
Embodiments provide a kind of method of internal memory early warning, see Fig. 3.
Wherein, the method comprises:
301: the modes of warning obtaining current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device.
Wherein, modes of warning comprises: accuracy pattern and coverage rate pattern.
The EMS memory error occurred under accuracy pattern needs to possess following characteristics: the frequency that EMS memory error occurs is high and can continue for some time; The EMS memory error occurred under coverage rate pattern needs to possess following characteristics: the funnel frequency of setting is lower than certain threshold value.
Include the parameter that can set in often kind of modes of warning, can comprise: funnel frequency, threshold value of warning.Optionally, threshold value of warning can calculate for other parameters in modes of warning.Accordingly, the calculating parameter that threshold value of warning is corresponding can comprise: duration parameters and multiple parameter.Duration parameters is the duration that internal memory makes a mistake, and the duration is too short, can not be confirmed whether to break down, longer then more accurate.
Therefore, can just like drawing a conclusion after compared with between accuracy pattern with coverage rate pattern:
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Wherein, accuracy requirement is higher, then funnel frequency is larger.
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Wherein, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Because the frequency of the correctable error of reality generation is greater than funnel frequency, so consider the situation of the frequency of higher generation correctable error, threshold value of warning needs to amplify 10 ~ 100 times by multiple parameter, and wherein accuracy requirement is higher, then multiple is larger.
Server admin personnel for the characteristic of often kind of modes of warning, can be arranged in server admin, select the modes of warning meeting current needs.
Wherein, in the hopper count mechanism of the current correctable error to internal memory, also have this parameter that can arrange of funnel frequency.After being provided with this funnel frequency, hopper count device while the number of times of internal memory generation correctable error recording each monitoring, also can carry out subtraction operation according to this funnel frequency to hopper count device, to reduce the number of times recorded in hopper count device.
The reason that internal memory breaks down is broadly divided into soft fault (as the bit reversal that cosmic rays causes), transient fault (crosstalk as data line), hard fault (damage or the inefficacy of certain bit as memory grain).
Soft fault may cause more internal memory to report an error and non-standing instantaneous, and this kind of EMS memory error can be repaired very soon, then can not again produce this mistake after a certain time.But the mistake caused due to soft fault can be recorded in hopper count device, the number of times of these records, once reach the threshold value of specifying, can trigger early warning, cause early warning to report by mistake.And by using funnel frequency to carry out subtraction operation to hopper count device, the number of times that reports an error of the soft fault recorded in hopper count device can be made on time dimension to be reduced after some period of time, then can not trigger early warning.
In the disclosed embodiments, step 301 can be realized by following steps:
3011: the modes of warning obtaining current setting;
3012: obtain the funnel frequency that modes of warning is corresponding;
3013: obtain duration parameters corresponding to modes of warning and multiple parameter;
3014: according to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Wherein, the computing formula of threshold value of warning can be:
Threshold value of warning=duration parameters * funnel frequency * multiple parameter;
Such as: can arrange: funnel frequency=10/second; Duration parameters=50 second; Multiple parameter is 10, then threshold value of warning is 5000.
In addition, less demanding to accuracy and require the extreme case failed to report less:
Funnel frequency=1/second; Maximum length in time=3 second; Multiple parameter is 10, then threshold value of warning is 30;
General real system higher but also consider the practical balance failed to report less to accuracy requirement simultaneously, therefore can arrange in the following manner:
Funnel frequency=1/second; Maximum length in time=60 second; Multiple parameter is 100, then threshold value of warning is 6000.
Further, the threshold value of warning that calculates may be made excessive to the exigent extreme case of accuracy, therefore need to limit the upper limit of threshold value of warning, if detect that threshold value of warning is greater than higher limit, then need to be re-set as higher limit to threshold value of warning, when this situation is detected, need to perform step 3015.
3015: if the threshold value of warning calculated has exceeded higher limit, then threshold value of warning is set to higher limit.
Such as: to the exigent extreme case of accuracy:
Funnel frequency=100/second, maximum length in time=60 second, multiple parameter is 100, then threshold value of warning is 600000, and wherein higher limit is 32767, therefore threshold value of warning is set to 32767.
302: the number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device.
Such as: the frequency of the number of times of correctable error is 100/second, and funnel frequency is 50/second, then per secondly in hopper count device 50 numerals are increased progressively.
303: when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC.
Wherein, when the number of times recorded in hopper count device reaches threshold value of warning, RAS (Reliability can be started, Availability, Serviceability, reliability, availability, serviceability) internal memory reparation operation, RAS internal memory reparation operation can solve memory grain failure of removal, but the corresponding cost that can pay.
When the number of times recorded in hopper count device reaches threshold value of warning and various repair mode all uses, need to perform DeviceTagging processing capacity, DeviceTagging processing capacity is that various repair function is all finished the processing capacity that the number of times that records in rear last hopper count device just can use above, but it can lose the correction capability of single-bit error.
Perform DeviceTagging as crossed, then represent that RAS process action is used up all, SMI interrupt handling routine sends pre-alarm and informs BMC.
Wherein, the mode sending early warning can realize for following mode:
3031: when monitoring the number of times recorded in described hopper count device and reaching threshold value of warning, trigger SMI (SystemManagementInterrupt, system management interrupt) and interrupt;
3032: when SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
From 3 aspects, the method for the internal memory early warning utilizing the embodiment of the present invention to provide, can ensure that the early warning that it exports is accurately and reliably:
A) high-frequency correctable error can improve the possibility that not correctable error occurs greatly, ought to early warning;
B) high-frequency correctable error causes error correction frequently can have a strong impact on the performance of system, ought to early warning;
C) (whether enables users is optional for DeviceTagging, defaultly to open) solve individual particle failure of removal due to DeviceTagging can be triggered after exceeding threshold value, but also result in this RANK internal memory simultaneously and lose single-bit error correction capability, be in dangerous edge, ought to early warning.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 3
Embodiments provide a kind of device of internal memory early warning, see Fig. 4.This device comprises:
Acquisition module 401, for obtaining the modes of warning of current setting, and obtains funnel frequency corresponding to modes of warning and threshold value of warning, and funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
Processing module 402, for being recorded in hopper count device by the number of times that correctable error occurs in the internal memory of current monitor, and carries out subtraction operation according to funnel frequency to hopper count device;
Warning module 403, for when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, sends early warning to baseboard management controller BMC.
Optionally, acquisition module 401, comprising:
Acquiring unit, for obtaining duration parameters corresponding to modes of warning and multiple parameter;
Computing unit, for according to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Optionally, acquisition module 401 also comprises:
Setting unit, if exceeded higher limit for the threshold value of warning calculated, has then been set to higher limit by threshold value of warning.
Optionally, modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Optionally, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Optionally, warning module 403, comprising:
Interrupt location, for when monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
Transmitting element, for when described SMI terminal hander detects that each repair mode uses, sends pre-alert notification message to BMC.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
Embodiment 4
Embodiments provide a kind of server,
Its structure is see Fig. 5, and wherein, this server comprises: storer 501 and at least one processor 502, and processor 502 is configured to perform following operation:
Obtain the modes of warning of current setting, and obtain funnel frequency corresponding to modes of warning and threshold value of warning, funnel frequency is the numerical value carrying out subtraction operation per second in default hopper count device;
The number of times that correctable error occurs in the internal memory of current monitor is recorded in hopper count device, and according to funnel frequency, subtraction operation is carried out to hopper count device;
When monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to baseboard management controller BMC.
Wherein, obtain the threshold value of warning that modes of warning is corresponding, comprising:
Obtain duration parameters corresponding to modes of warning and multiple parameter;
According to duration parameters, multiple parameter and funnel parameters, calculate the threshold value of warning that modes of warning is corresponding.
Wherein, after calculating threshold value of warning corresponding to modes of warning, method also comprises:
If the threshold value of warning calculated has exceeded higher limit, then threshold value of warning is set to higher limit.
Wherein, modes of warning comprises: accuracy pattern and coverage rate pattern;
Funnel frequency corresponding to accuracy pattern is greater than funnel frequency corresponding to coverage rate pattern;
Duration parameters corresponding to accuracy pattern is greater than duration parameters corresponding to coverage rate pattern.
Wherein, funnel frequency is between 1/second to 100/second; Duration parameters is between 3 seconds to 60 seconds; Multiple parameter is between 10 to 100.
Wherein, when monitoring the number of times recorded in hopper count device and reach threshold value of warning and each repair mode using, send early warning to BMC, comprising:
When monitoring the number of times recorded in described hopper count device and reaching described threshold value of warning, triggering system management interrupt SMI interrupts;
When described SMI terminal hander detects that each repair mode uses, send pre-alert notification message to BMC.
The embodiment of the present invention is passed through to obtain the funnel frequency in current modes of warning and threshold value of warning, when the correctable error of hopper count device to every bar internal memory counts, by funnel frequency, subtraction operation is carried out to hopper count device, and when hopper count device reaches threshold value of warning and each repair mode uses, send early warning to BMC.Achieve the alarm mode to the correctable error of internal memory on time dimension, improve accuracy and the actual effect of early warning.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.