CN105117301B - A kind of method and device of memory early warning - Google Patents

A kind of method and device of memory early warning Download PDF

Info

Publication number
CN105117301B
CN105117301B CN201510500335.4A CN201510500335A CN105117301B CN 105117301 B CN105117301 B CN 105117301B CN 201510500335 A CN201510500335 A CN 201510500335A CN 105117301 B CN105117301 B CN 105117301B
Authority
CN
China
Prior art keywords
warning
frequency
funnel
threshold value
modes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510500335.4A
Other languages
Chinese (zh)
Other versions
CN105117301A (en
Inventor
宋刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
Hangzhou Huawei Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huawei Digital Technologies Co Ltd filed Critical Hangzhou Huawei Digital Technologies Co Ltd
Priority to CN201510500335.4A priority Critical patent/CN105117301B/en
Publication of CN105117301A publication Critical patent/CN105117301A/en
Application granted granted Critical
Publication of CN105117301B publication Critical patent/CN105117301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Filling Or Emptying Of Bunkers, Hoppers, And Tanks (AREA)

Abstract

The invention discloses a kind of method and devices of memory early warning, belong to server technology field.The method includes:The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of the modes of warning and threshold value of warning, the funnel frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device;The number that correctable error occurs in the memory of current monitor is recorded in the hopper count device, and subtraction operation is carried out to the hopper count device according to the funnel frequency;Reach the threshold value of warning when monitoring the number recorded in the hopper count device and when each repair mode has used, early warning is sent out to baseboard management controller BMC.The present invention realizes the alarm mode of the correctable error to memory on time dimension, improves the accuracy and actual effect of early warning.

Description

A kind of method and device of memory early warning
Technical field
The present invention relates to server technology field, more particularly to a kind of method and device of memory early warning.
Background technology
With the development of server technology, the memory size configured in server is increasing, and the speed of memory operation is got over Come it is higher, these high-capacity and high-speeds operation memory become most influence system stability failure area occurred frequently, how in memory Judge in advance before catastrophe failure occurs and disposes as an important demand of system stability reliability and technological difficulties.
Currently, be provided with hopper count device in the server, what which can record that every memory occurs entangles The number of lookup error.BMC meeting automatic regular polling hopper count devices, when the number for monitoring correctable error reaches predetermined threshold value, Early warning can be then triggered, to prompt server admin personnel to carry out troubleshooting.
The inventor finds that the existing technology has at least the following problems:
Certain memory failures belong to soft fault (such as:Bit reversal caused by cosmic ray etc.), this kind of failure can be one Fix time it is interior restore normal, not correctable error.It is this that early warning is carried out in a manner of pure counting, time dimension is not considered Factor causes early warning to report by mistake, in turn results in the operation and maintenance inefficiency of server.
Invention content
In order to solve the problems in the prior art, an embodiment of the present invention provides a kind of method and devices of memory early warning. The technical solution is as follows:
In a first aspect, a kind of method of memory early warning provided in an embodiment of the present invention, the method includes:
The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of the modes of warning and threshold value of warning, The funnel frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device;
The number that correctable error occurs in the memory of current monitor is recorded in the hopper count device, and according to institute It states funnel frequency and subtraction operation is carried out to the hopper count device;
When monitoring, the number recorded in the hopper count device reaches the threshold value of warning and each repair mode has used When complete, early warning is sent out to baseboard management controller BMC.
It is described to obtain the corresponding early warning threshold of the modes of warning in the first possible realization method of first aspect Value, including:
Obtain the corresponding duration parameters of modes of warning and multiple parameter;
According to the duration parameters, the multiple parameter and the funnel parameters calculate the modes of warning pair The threshold value of warning answered.
The possible realization method of with reference to first aspect the first, in second of possible realization method of first aspect In, it is described calculate the corresponding threshold value of warning of the modes of warning after, the method further includes:
If the threshold value of warning calculated has been more than upper limit value, it sets the threshold value of warning to the upper limit value.
The possible realization method of with reference to first aspect the first, in the third possible realization method of first aspect In, the modes of warning includes:Accurate sexual norm and coverage rate pattern;
The corresponding funnel frequency of the accurate sexual norm is more than the corresponding funnel frequency of the coverage rate pattern;
The corresponding duration parameters of the accurate sexual norm are more than the corresponding duration parameters of the coverage rate pattern.
The possible realization method of with reference to first aspect the first, in the 4th kind of possible realization method of first aspect In, the funnel frequency is 1/second between 100/second;The duration parameters are between 3 seconds to 60 seconds;Described times Between number parameter is 10 to 100.
It is described to be recorded in the hopper count device when monitoring in the 5th kind of possible realization method of first aspect Number reaches the threshold value of warning and when each repair mode has used, and early warning is sent out to BMC, including:
When monitoring the number recorded in the hopper count device and reaching the threshold value of warning, system management interrupt is triggered SMI interrupt;
When the SMI terminal handers detect that each repair mode has used, sends pre-alert notification to BMC and disappear Breath.
Second aspect, a kind of device of memory early warning provided in an embodiment of the present invention, described device include:
Acquisition module for obtaining the modes of warning currently set, and obtains the corresponding funnel frequency of the modes of warning And threshold value of warning, the funnel frequency are the numerical value per second for carrying out subtraction operation in preset hopper count device;
Processing module, the number for correctable error to occur in the memory by current monitor are recorded in the hopper count In device, and subtraction operation is carried out to the hopper count device according to the funnel frequency;
Warning module, for reaching the threshold value of warning and respectively repairing when monitoring the number recorded in the hopper count device When compound formula has used, early warning is sent out to baseboard management controller BMC.
In the first possible realization method of second aspect, the acquisition module, including:
Acquiring unit, for obtaining the corresponding duration parameters of modes of warning and multiple parameter;
Computing unit, for according to the duration parameters, the multiple parameter and the funnel parameters, calculating institute State the corresponding threshold value of warning of modes of warning.
In conjunction with the first possible realization method of second aspect, in second of possible realization method of second aspect In, the acquisition module further includes:
The threshold value of warning is arranged if the threshold value of warning for calculating has been more than upper limit value for setting unit For the upper limit value.
In conjunction with the first possible realization method of second aspect, in the third possible realization method of second aspect In, the modes of warning includes:Accurate sexual norm and coverage rate pattern;
The corresponding funnel frequency of the accurate sexual norm is more than the corresponding funnel frequency of the coverage rate pattern;
The corresponding duration parameters of the accurate sexual norm are more than the corresponding duration parameters of the coverage rate pattern.
In conjunction with the first possible realization method of second aspect, in the 4th kind of possible realization method of second aspect In, the funnel frequency is 1/second between 100/second;The duration parameters are between 3 seconds to 60 seconds;Described times Between number parameter is 10 to 100.
In the 5th kind of possible realization method of second aspect, the warning module, including:
Interrupt location, for when monitoring the number recorded in the hopper count device and reaching the threshold value of warning, touching Send out system management interrupt SMI interrupt;
Transmission unit, for when the SMI terminal handers detect that each repair mode has used, being sent out to BMC Send pre-alert notification message.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:
By obtaining funnel frequency and threshold value of warning in current modes of warning, in hopper count device to every memory When correctable error is counted, subtraction operation is carried out to hopper count device by funnel frequency, and reach in hopper count device Threshold value of warning and when each repair mode has used, early warning is sent out to BMC.Realize the correcting to memory on time dimension The alarm mode of mistake improves the accuracy and actual effect of early warning.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is the method flow diagram for the memory early warning that the embodiment of the present invention 1 provides;
Fig. 2 be the embodiment of the present invention 1 provide memory early warning method in memory failure early warning mathematical model schematic diagram;
Fig. 3 is the method flow diagram for the memory early warning that the embodiment of the present invention 2 provides;
Fig. 4 is the apparatus structure schematic diagram for the memory early warning that the embodiment of the present invention 3 provides;
Fig. 5 is the structural schematic diagram for the server that the embodiment of the present invention 4 provides.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment 1
An embodiment of the present invention provides a kind of methods of memory early warning, referring to Fig. 1.
Wherein, this method includes:
101:The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of modes of warning and threshold value of warning, Funnel frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device;
102:The number that correctable error occurs in the memory of current monitor is recorded in hopper count device, and according to leakage The frequency that struggles against carries out subtraction operation to hopper count device;
103:Reach threshold value of warning and when each repair mode has used when monitoring the number recorded in hopper count device, Early warning is sent out to BMC (Baseboard Management Controller, baseboard management controller).
As shown in Fig. 2, showing memory failure early warning mathematical model in the figure, horizontal axis is represented and can not be entangled wherein in the model The frequency that lookup error occurs, the longitudinal axis represent the possibility that catastrophe failure occurs for system under the ECC frequencies;It can be right by the model Memory failure carries out qualitative analysis and obtains as drawn a conclusion:
The possibility that catastrophe failure occurs for the bigger system of frequency that correctable error does not occur is namely bigger;
The funnel frequency threshold choosing of the frequency of correctable error generation is bigger, that is, the accuracy of early warning is bigger; The coverage rate of early warning is lower.
The embodiment of the present invention is by obtaining funnel frequency and threshold value of warning in current modes of warning, in hopper count device When being counted to the correctable error of every memory, subtraction operation is carried out to hopper count device by funnel frequency, and leaking Bucket counter reaches threshold value of warning and when each repair mode has used, and early warning is sent out to BMC.It is right on time dimension to realize The alarm mode of the correctable error of memory improves the accuracy and actual effect of early warning.
Embodiment 2
An embodiment of the present invention provides a kind of methods of memory early warning, referring to Fig. 3.
Wherein, this method includes:
301:The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of modes of warning and threshold value of warning, Funnel frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device.
Wherein, modes of warning includes:Accurate sexual norm and coverage rate pattern.
Raw EMS memory error, which is issued, in accurate sexual norm needs to have following characteristics:The frequency height and energy that EMS memory error occurs It continues for some time;The EMS memory error occurred under coverage rate pattern needs to have following characteristics:The funnel frequency of setting is less than Certain threshold value.
Include the parameter that can be set in each modes of warning, may include:Funnel frequency, threshold value of warning.Optionally, Threshold value of warning can be that other parameters are calculated in modes of warning.Correspondingly, the corresponding calculating parameter of threshold value of warning can To include:Duration parameters and multiple parameter.Duration parameters are the duration that mistake occurs for memory, and the duration is too It is short, it cannot be confirmed whether to break down, it is more long then more accurate.
It therefore, can be just like drawing a conclusion after accurate sexual norm is compared between coverage rate pattern relatively:
The corresponding funnel frequency of accurate sexual norm is more than the corresponding funnel frequency of coverage rate pattern;
Wherein, accuracy requirement is higher, then funnel frequency is bigger.
The corresponding duration parameters of accurate sexual norm are more than the corresponding duration parameters of coverage rate pattern.
Wherein, funnel frequency is 1/second between 100/second;Duration parameters are between 3 seconds to 60 seconds;Multiple Between parameter is 10 to 100.
Since the frequency of the correctable error actually occurred is more than funnel frequency, it is contemplated that can be corrected to higher generation The case where frequency of mistake, threshold value of warning need to amplify 10~100 times by multiple parameter, and wherein accuracy requirement is higher, then Multiple is bigger.
Server admin personnel can be directed to the characteristic of each modes of warning, be configured in server admin, select Meet the modes of warning currently needed.
Wherein, currently in the hopper count mechanism of the correctable error of memory, there is also there is funnel frequency, this can With the parameter of setting.After provided with the funnel frequency, hopper count device can occur in the memory for recording each monitoring can While correcting the number of mistake, also subtraction operation can be carried out to hopper count device according to the funnel frequency, to reduce in terms of funnel The number recorded in number device.
The reason of memory breaks down is broadly divided into soft fault (bit reversal as caused by cosmic ray), transient fault (crosstalk of such as data line), hard fault (damage of such as memory grain or the failure of certain bit).
Soft fault may instantaneously cause more memory to report an error and non-persistent, and this kind of EMS memory error can be entangled quickly Just, the mistake will not then be generated again after a certain time.But due to the meeting in hopper count device of mistake caused by soft fault It is recorded, the number of these records can trigger early warning if the threshold value for reaching specified, and early warning is caused to report by mistake.And by using leakage The frequency that struggles against carries out subtraction operation to hopper count device, and the soft fault recorded in hopper count device can be made on time dimension The number that reports an error is reduced after some period of time, then will not trigger early warning.
In the embodiments of the present disclosure, step 301 can be realized by following steps:
3011:Obtain the modes of warning currently set;
3012:Obtain the corresponding funnel frequency of modes of warning;
3013:Obtain the corresponding duration parameters of modes of warning and multiple parameter;
3014:According to duration parameters, multiple parameter and funnel parameters calculate the corresponding early warning threshold of modes of warning Value.
Wherein, the calculation formula of threshold value of warning can be:
Threshold value of warning=duration parameters * funnel frequency * multiple parameters;
Such as:It can be arranged:Funnel frequency=10/second;Duration parameters=50 second;Multiple parameter is 10, then in advance Alert threshold value is 5000.
In addition, extreme case of less demanding to accuracy and that requirement is failed to report less:
Funnel frequency=1/second;Maximum length in time=3 second;Multiple parameter is 10, then threshold value of warning is 30;
General real system is more demanding to accuracy but is also contemplated for the practical balance failed to report less simultaneously, therefore can be with It is configured as follows:
Funnel frequency=1/second;Maximum length in time=60 second;Multiple parameter is 100, then threshold value of warning is 6000.
Further, the threshold value of warning being calculated is excessive to be may be such that the exigent extreme case of accuracy, Therefore it needs to limit the upper limit of threshold value of warning, if detecting that threshold value of warning is more than upper limit value, need to early warning threshold Value is re-set as upper limit value, when this situation is detected, needs to execute step 3015.
3015:If the threshold value of warning calculated has been more than upper limit value, it sets threshold value of warning to upper limit value.
Such as:To the exigent extreme case of accuracy:
Funnel frequency=100/second, maximum length in time=60 second, multiple parameter is 100, then threshold value of warning is 600000, wherein upper limit value is 32767, therefore threshold value of warning is set as 32767.
302:The number that correctable error occurs in the memory of current monitor is recorded in hopper count device, and according to leakage The frequency that struggles against carries out subtraction operation to hopper count device.
Such as:The frequency of the number of correctable error is 100/second, and funnel frequency is 50/second, then hopper count device In it is per second be incremented by 50 numbers.
303:Reach threshold value of warning and when each repair mode has used when monitoring the number recorded in hopper count device, Early warning is sent out to BMC.
Wherein, when the number recorded in hopper count device reaches threshold value of warning, can start RAS (Reliability, Availability, Serviceability, reliability, availability, serviceability) memory reparation operation, RAS memory reparations behaviour Make the corresponding cost that can be solved memory grain failure of removal, but can pay.
When the number recorded in hopper count device reaches threshold value of warning and various repair modes have all used, need Execute Device Tagging processing functions, Device Tagging processing functions be after the various repair functions in front are all finished most The processing function that the number recorded in a hopper count device afterwards can just be used, but it can lose the correction energy of single-bit error Power.
Device Tagging are performed as crossed, then it represents that RAS processing action has all been used up, SMI interrupt handler hair Send pre- alarm notification to BMC.
Wherein, the mode for sending out early warning can be that following manner is realized:
3031:When monitoring the number recorded in the hopper count device and reaching threshold value of warning, SMI (System are triggered Management Interrupt, system management interrupt) it interrupts;
3032:When SMI terminal handers detect that each repair mode has used, sends pre-alert notification to BMC and disappear Breath.
Using the method for memory early warning provided in an embodiment of the present invention, it can ensure the early warning of its output in terms of 3 It is accurately and reliably:
A) high-frequency correctable error can greatly improve the possibility that correctable error does not occur, ought to early warning;
B) high-frequency correctable error causes frequent error correction that can seriously affect the performance of system, ought to early warning;
C) (whether enabled Device Tagging users be optional, default opening) is due to being more than that can trigger Device after threshold value Tagging solves individual particle failure of removal, but also results in the RANK memories simultaneously and lose single-bit error correction capability, It, ought to early warning in dangerous edge.
The embodiment of the present invention is by obtaining funnel frequency and threshold value of warning in current modes of warning, in hopper count device When being counted to the correctable error of every memory, subtraction operation is carried out to hopper count device by funnel frequency, and leaking Bucket counter reaches threshold value of warning and when each repair mode has used, and early warning is sent out to BMC.It is right on time dimension to realize The alarm mode of the correctable error of memory improves the accuracy and actual effect of early warning.
Embodiment 3
An embodiment of the present invention provides a kind of devices of memory early warning, referring to Fig. 4.The device includes:
Acquisition module 401, for obtaining the modes of warning currently set, and obtain the corresponding funnel frequency of modes of warning with And threshold value of warning, funnel frequency are the numerical value per second for carrying out subtraction operation in preset hopper count device;
Processing module 402, the number for correctable error to occur in the memory by current monitor are recorded in hopper count In device, and subtraction operation is carried out to hopper count device according to funnel frequency;
Warning module 403 reaches threshold value of warning and each reparation side for that ought monitor the number recorded in hopper count device When formula has used, early warning is sent out to baseboard management controller BMC.
Optionally, acquisition module 401, including:
Acquiring unit, for obtaining the corresponding duration parameters of modes of warning and multiple parameter;
Computing unit, for according to duration parameters, multiple parameter and funnel parameters, it is corresponding to calculate modes of warning Threshold value of warning.
Optionally, acquisition module 401 further includes:
Setting unit sets threshold value of warning to upper limit value if the threshold value of warning for calculating has been more than upper limit value.
Optionally, modes of warning includes:Accurate sexual norm and coverage rate pattern;
The corresponding funnel frequency of accurate sexual norm is more than the corresponding funnel frequency of coverage rate pattern;
The corresponding duration parameters of accurate sexual norm are more than the corresponding duration parameters of coverage rate pattern.
Optionally, funnel frequency is 1/second between 100/second;Duration parameters are between 3 seconds to 60 seconds;Times Between number parameter is 10 to 100.
Optionally, warning module 403, including:
Interrupt location, for when monitoring the number recorded in the hopper count device and reaching the threshold value of warning, touching Send out system management interrupt SMI interrupt;
Transmission unit, for when the SMI terminal handers detect that each repair mode has used, being sent out to BMC Send pre-alert notification message.
The embodiment of the present invention is by obtaining funnel frequency and threshold value of warning in current modes of warning, in hopper count device When being counted to the correctable error of every memory, subtraction operation is carried out to hopper count device by funnel frequency, and leaking Bucket counter reaches threshold value of warning and when each repair mode has used, and early warning is sent out to BMC.It is right on time dimension to realize The alarm mode of the correctable error of memory improves the accuracy and actual effect of early warning.
Embodiment 4
An embodiment of the present invention provides a kind of server,
Its structure is referring to Fig. 5, wherein the server includes:Memory 501 and at least one processor 502, processor 502 are configured as executing following operation:
The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of modes of warning and threshold value of warning, funnel Frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device;
The number that correctable error occurs in the memory of current monitor is recorded in hopper count device, and according to funnel frequency Rate carries out subtraction operation to hopper count device;
Reach threshold value of warning and when each repair mode has used when monitoring the number recorded in hopper count device, Xiang Ji Board management controller BMC sends out early warning.
Wherein, the corresponding threshold value of warning of modes of warning is obtained, including:
Obtain the corresponding duration parameters of modes of warning and multiple parameter;
According to duration parameters, multiple parameter and funnel parameters calculate the corresponding threshold value of warning of modes of warning.
Wherein, after calculating the corresponding threshold value of warning of modes of warning, method further includes:
If the threshold value of warning calculated has been more than upper limit value, it sets threshold value of warning to upper limit value.
Wherein, modes of warning includes:Accurate sexual norm and coverage rate pattern;
The corresponding funnel frequency of accurate sexual norm is more than the corresponding funnel frequency of coverage rate pattern;
The corresponding duration parameters of accurate sexual norm are more than the corresponding duration parameters of coverage rate pattern.
Wherein, funnel frequency is 1/second between 100/second;Duration parameters are between 3 seconds to 60 seconds;Multiple Between parameter is 10 to 100.
Wherein, when monitoring, the number recorded in hopper count device reaches threshold value of warning and each repair mode has used When, early warning is sent out to BMC, including:
When monitoring the number recorded in the hopper count device and reaching the threshold value of warning, system management interrupt is triggered SMI interrupt;
When the SMI terminal handers detect that each repair mode has used, sends pre-alert notification to BMC and disappear Breath.
The embodiment of the present invention is by obtaining funnel frequency and threshold value of warning in current modes of warning, in hopper count device When being counted to the correctable error of every memory, subtraction operation is carried out to hopper count device by funnel frequency, and leaking Bucket counter reaches threshold value of warning and when each repair mode has used, and early warning is sent out to BMC.It is right on time dimension to realize The alarm mode of the correctable error of memory improves the accuracy and actual effect of early warning.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can pass through hardware Complete, relevant hardware can also be instructed to complete by program, program can be stored in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of method of memory early warning, which is characterized in that the method includes:
The modes of warning currently set is obtained, and obtains the corresponding funnel frequency of the modes of warning, the funnel frequency is pre- If hopper count device in the numerical value per second for carrying out subtraction operation, the hopper count device be used to record every memory occurs can Correct the number of mistake;
The corresponding duration parameters of the modes of warning and multiple parameter are obtained, and calculates the duration parameters, institute The product for stating multiple parameter and the funnel frequency obtains the corresponding threshold value of warning of the modes of warning;
The number that correctable error occurs in the memory of current monitor is recorded in the hopper count device, and according to the leakage The frequency that struggles against carries out subtraction operation to the hopper count device;
Reach the threshold value of warning and when each repair mode has used when monitoring the number recorded in the hopper count device, Early warning is sent out to baseboard management controller BMC.
2. according to the method described in claim 1, it is characterized in that, it is described calculate the corresponding threshold value of warning of the modes of warning it Afterwards, the method further includes:
If the threshold value of warning calculated has been more than upper limit value, it sets the threshold value of warning to the upper limit value.
3. according to the method described in claim 1, it is characterized in that, the modes of warning includes:Accurate sexual norm and covering Rate pattern;
The corresponding funnel frequency of the accurate sexual norm is more than the corresponding funnel frequency of the coverage rate pattern;
The corresponding duration parameters of the accurate sexual norm are more than the corresponding duration parameters of the coverage rate pattern.
4. according to the method described in claim 1, it is characterized in that, the funnel frequency be 1/second between 100/second; The duration parameters are between 3 seconds to 60 seconds;Between the multiple parameter is 10 to 100.
5. according to the method described in claim 1, it is characterized in that, described work as time for monitoring and being recorded in the hopper count device Number reaches the threshold value of warning and when each repair mode has used, and early warning is sent out to BMC, including:
When monitoring the number recorded in the hopper count device and reaching the threshold value of warning, system management interrupt SMI is triggered It interrupts;
When the SMI interrupt handler detects that each repair mode has used, pre-alert notification message is sent to BMC.
6. a kind of device of memory early warning, which is characterized in that described device includes:
Acquisition module for obtaining the modes of warning currently set, and obtains the corresponding funnel frequency of the modes of warning, described Funnel frequency is the numerical value per second for carrying out subtraction operation in preset hopper count device, and the hopper count device is for recording every The number for the correctable error that memory occurs;
The acquisition module includes acquiring unit and computing unit;
The acquiring unit, for obtaining the corresponding duration parameters of the modes of warning and multiple parameter;
The computing unit, the product for calculating the duration parameters, the multiple parameter and the funnel frequency, obtains To the corresponding threshold value of warning of the modes of warning;
Processing module, the number for correctable error to occur in the memory by current monitor are recorded in the hopper count device In, and subtraction operation is carried out to the hopper count device according to the funnel frequency;
Warning module reaches the threshold value of warning and each reparation side for that ought monitor the number recorded in the hopper count device When formula has used, early warning is sent out to baseboard management controller BMC.
7. device according to claim 6, which is characterized in that the acquisition module further includes:
Setting unit sets the threshold value of warning to institute if the threshold value of warning for calculating has been more than upper limit value State upper limit value.
8. device according to claim 6, which is characterized in that the modes of warning includes:Accurate sexual norm and covering Rate pattern;
The corresponding funnel frequency of the accurate sexual norm is more than the corresponding funnel frequency of the coverage rate pattern;
The corresponding duration parameters of the accurate sexual norm are more than the corresponding duration parameters of the coverage rate pattern.
9. device according to claim 6, which is characterized in that the funnel frequency is 1/second between 100/second; The duration parameters are between 3 seconds to 60 seconds;Between the multiple parameter is 10 to 100.
10. device according to claim 6, which is characterized in that the warning module, including:
Interrupt location, for when monitoring the number recorded in the hopper count device and reaching the threshold value of warning, triggering system System management interrupt SMI interrupt;
Transmission unit, for when the SMI interrupt handler detects that each repair mode has used, being sent to BMC pre- Alert notification message.
CN201510500335.4A 2015-08-14 2015-08-14 A kind of method and device of memory early warning Active CN105117301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510500335.4A CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510500335.4A CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Publications (2)

Publication Number Publication Date
CN105117301A CN105117301A (en) 2015-12-02
CN105117301B true CN105117301B (en) 2018-08-14

Family

ID=54665301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510500335.4A Active CN105117301B (en) 2015-08-14 2015-08-14 A kind of method and device of memory early warning

Country Status (1)

Country Link
CN (1) CN105117301B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008090A (en) * 2019-04-15 2019-07-12 苏州浪潮智能科技有限公司 A kind of method, apparatus and computer readable storage medium monitoring EMS memory error

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590047B (en) * 2016-07-08 2021-02-12 佛山市顺德区顺达电脑厂有限公司 SMI signal timeout monitoring system and method
CN106445720A (en) * 2016-10-11 2017-02-22 郑州云海信息技术有限公司 Memory error recovery method and device
CN113407391A (en) * 2016-12-05 2021-09-17 华为技术有限公司 Fault processing method, computer system, substrate management controller and system
CN107077408A (en) 2016-12-05 2017-08-18 华为技术有限公司 Method, computer system, baseboard management controller and the system of troubleshooting
CN109328340B (en) * 2017-09-30 2021-06-08 华为技术有限公司 Memory fault detection method and device and server
CN109992477B (en) * 2019-03-27 2021-07-16 联想(北京)有限公司 Information processing method and system for electronic equipment and electronic equipment
CN110780646B (en) * 2019-09-21 2021-11-26 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN111008091A (en) * 2019-12-06 2020-04-14 苏州浪潮智能科技有限公司 Fault processing method, system and related device for memory CE
CN111459557B (en) * 2020-03-12 2023-04-07 烽火通信科技股份有限公司 Method and system for shortening starting time of server
US20240241805A1 (en) * 2021-09-25 2024-07-18 Intel Corporation Apparatus, computer-readable medium, and method for increasing memory error handling accuracy
CN115543677A (en) * 2022-11-29 2022-12-30 苏州浪潮智能科技有限公司 Correctable error processing method, device and equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681909A (en) * 2012-04-28 2012-09-19 浪潮电子信息产业股份有限公司 Server early-warning method based on memory errors
JP5078582B2 (en) * 2007-12-10 2012-11-21 オムロンオートモーティブエレクトロニクス株式会社 Motor control device
CN103092739A (en) * 2013-01-18 2013-05-08 浪潮电子信息产业股份有限公司 Memory error checking and correcting (ECC) error reporting and alarm mechanism
CN103605602A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Method for filtering out malfunctions of distributed computer system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5078582B2 (en) * 2007-12-10 2012-11-21 オムロンオートモーティブエレクトロニクス株式会社 Motor control device
CN102681909A (en) * 2012-04-28 2012-09-19 浪潮电子信息产业股份有限公司 Server early-warning method based on memory errors
CN103092739A (en) * 2013-01-18 2013-05-08 浪潮电子信息产业股份有限公司 Memory error checking and correcting (ECC) error reporting and alarm mechanism
CN103605602A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Method for filtering out malfunctions of distributed computer system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008090A (en) * 2019-04-15 2019-07-12 苏州浪潮智能科技有限公司 A kind of method, apparatus and computer readable storage medium monitoring EMS memory error

Also Published As

Publication number Publication date
CN105117301A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN105117301B (en) A kind of method and device of memory early warning
US10430260B2 (en) Troubleshooting method, computer system, baseboard management controller, and system
US11119874B2 (en) Memory fault detection
US12014791B2 (en) Memory fault handling method and apparatus, device, and storage medium
US8639991B2 (en) Optimizing performance of an application
CN109710501B (en) Method and system for detecting data transmission stability of server
CN110224885B (en) Equipment monitoring alarm method and device, storage medium and electronic equipment
CN105471932B (en) Monitoring method, device and system for front-end application
CN104065526B (en) A kind of method and apparatus of server failure alarm
CN108845912A (en) Service interface calls the alarm method of failure and calculates equipment
CN105335262A (en) Method for automatically calculating and early warning faults of batch server components
CN107579861A (en) Website Usability alarm method, device and electronic equipment based on multi-line monitoring
EP3358467A1 (en) Fault processing method, computer system, baseboard management controller and system
CN106201753B (en) Method and system for processing PCIE errors in linux
CN107368058A (en) It is a kind of for the fault monitoring method of equipment, equipment and computer-readable medium
CN117076186B (en) Memory fault detection method, system, device, medium and server
CN108899059B (en) Detection method and equipment for solid state disk
CN111880992B (en) Monitoring and maintaining method for controller state in storage device
CN111586129A (en) Alarm method and device for data synchronization, electronic equipment and storage medium
CN107256192A (en) A kind of monitoring method of clock failure, device and server
CN115080362A (en) PCIE (peripheral component interface express) equipment speed reduction reporting method, system, equipment and storage medium
CN113742176A (en) Fault prediction method and device and electronic equipment
CN114003426A (en) Fault processing method and system and electronic equipment
CN114374627A (en) Method, device and system for restarting baseboard management controller and server
CN114610560A (en) System abnormity monitoring method, device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200420

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Patentee before: Hangzhou Huawei Digital Technology Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211223

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee after: xFusion Digital Technologies Co., Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.