CN115686909A - Memory fault prediction method and device, storage medium and electronic device - Google Patents

Memory fault prediction method and device, storage medium and electronic device Download PDF

Info

Publication number
CN115686909A
CN115686909A CN202211350534.8A CN202211350534A CN115686909A CN 115686909 A CN115686909 A CN 115686909A CN 202211350534 A CN202211350534 A CN 202211350534A CN 115686909 A CN115686909 A CN 115686909A
Authority
CN
China
Prior art keywords
current
type
period
memory
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211350534.8A
Other languages
Chinese (zh)
Inventor
信仕尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211350534.8A priority Critical patent/CN115686909A/en
Publication of CN115686909A publication Critical patent/CN115686909A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a memory fault prediction method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring memory fault information in a current period, wherein the memory fault information comprises the type of faults occurring in the current period in the memory of the server; determining the times of occurrence of various types of faults in the current period according to the memory fault information; and determining a prediction result of each type of fault occurring in the next period of the current period according to the number of times of each type of fault occurring in the current period and the number of times of each type of fault occurring in the historical period, wherein the historical period comprises one or more periods before the current period. Through the method and the device, the problem that timeliness for determining the fault condition of the server memory is low is solved, and the effect of improving timeliness for determining the fault condition of the server memory is achieved.

Description

Memory fault prediction method and device, storage medium and electronic device
Technical Field
The embodiment of the application relates to the field of computers, in particular to a memory fault prediction method and device, a storage medium and an electronic device.
Background
With the rapid development of internet services, the stability of a server has a crucial development for supporting the development of internet services, and the server often has hardware faults and memory faults, which may cause great loss. The amount of the memory of the server is large, and the most serious condition of the memory failure is machine downtime and system breakdown, which are fatal attacks to upper-layer services. Therefore, how to avoid the loss caused by the memory failure of the server is a problem to be solved in the current task.
In the prior art, when a BMC (Baseboard Management Controller) locates a memory fault, the related fault is often located by combining with a related fault log when a server is down. In such a manner, server loss caused by memory failure cannot be avoided, and the current health condition of the server memory cannot be judged according to the memory failure information.
Aiming at the technical problem that the timeliness for determining the fault condition of the server memory is low in the related technology, an effective solution is not provided.
Disclosure of Invention
The embodiment of the application provides a memory fault prediction method and device, a storage medium and an electronic device, so as to at least solve the problem that timeliness for determining a fault condition of a server memory in the related art is low.
According to an embodiment of the present application, a method for predicting a memory failure is provided, including:
acquiring memory fault information in a current period, wherein the memory fault information comprises the type of faults occurring in the current period in a memory of a server;
determining the frequency of the occurrence of each type of fault in the current period of the memory according to the memory fault information;
and determining the prediction result of the faults of each type in the next period of the current period according to the number of times of the faults of each type in the current period and the number of times of the faults of each type in the historical period, wherein the historical period comprises one or more periods before the current period.
In an exemplary embodiment, the determining, according to the number of times that the various types of faults occur in the current period and the number of times that the various types of faults occur in the historical period, a prediction result of the various types of faults occurring in the next period of the current period in the memory includes:
determining the predicted number of times of occurrence of the first type of fault in the next cycle in the memory according to N first historical times and a first current time when the types include a first type and the historical cycle includes N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the first current time is the number of times of occurrence of the first type of fault in the current cycle, and the N first historical times include the number of times of occurrence of the first type of fault in each of the N cycles in the memory; or alternatively
Determining a predicted probability that the fault of the first type occurs in the next cycle in the memory according to the N first historical times, N total historical times, the first current times and the current total times when the types include the first type and the historical cycles include the N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the first current times is the number of times that the fault of the first type occurs in the current cycle, the current total times is the total number of times that the fault of the respective type occurs in the current cycle, the N first historical times includes the number of times that the fault of the first type occurs in each of the N cycles in the memory, and the N total historical times includes the total number of times that the fault of the respective type occurs in each of the N cycles in the memory.
In an exemplary embodiment, the determining, according to the N first historical times and the first current times, the predicted number of times that the first type of fault occurs in the next cycle includes:
determining a first fitting curve in a target coordinate system according to the N first historical times corresponding to the N cycles and the first current times corresponding to the current cycle, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the times of the first type of fault occurring in the memory, and the first fitting curve at least comprises points corresponding to the N cycles, the current cycle and the next cycle respectively;
and determining the value of the point on the ordinate corresponding to the next period on the first fitting curve as the prediction frequency.
In an exemplary embodiment, the determining a first fitted curve in a target coordinate system according to the N first historical times corresponding to the N cycles and the first current time corresponding to the current cycle includes:
and under the condition that a history fitting curve exists in the target coordinate system, correcting the history fitting curve according to the N first history times corresponding to the N periods and the first current time corresponding to the current period to obtain the first fitting curve, wherein the history fitting curve is a fitting curve determined according to at least part of the N first history times and the periods corresponding to the at least part of the N periods.
In an exemplary embodiment, the determining, according to the N first historical times, the N total historical times, the first current time, and the total current time, the predicted probability that the first type of fault occurs in the next period includes:
determining a second fitting curve under a target coordinate system according to N first historical probabilities corresponding to the N periods and the first current probability corresponding to the current period, wherein an abscissa of the target coordinate system represents time, an ordinate of the target coordinate system represents probability of the first type of fault occurring in the memory, the second fitting curve at least includes points corresponding to the N periods, the current period and the next period respectively, the N first historical probabilities are ratios of each of the N first historical times to a corresponding historical total number of the N historical total times, and the first current probability is a ratio of the first current time to the current total number of times;
and determining the value of the point corresponding to the next period on the second fitting curve on the ordinate as the prediction probability.
In an exemplary embodiment, the determining that the memory has the prediction result of the occurrence of the faults of each type in the next period of the current period according to the number of times that the faults of each type occur in the current period and the number of times that the faults of each type occur in the historical period includes:
determining, according to N second historical times and a second current time, a predicted weighted cumulative number of times that the second type of fault occurs in the next cycle in the memory when the respective types include a second type and the historical cycle includes N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the second current time is a number of times that the second type of fault occurs in the current cycle in the memory, and the N second historical times include a number of times that the second type of fault occurs in each of the N cycles in the memory; or
And under the condition that the types include the second type and the history period includes N periods before the current period, determining a predicted weighted cumulative probability that the second type of fault occurs in the next period in the memory according to the N second history times, N total history times, the second current times and the current total times, where N is 1 or N is a positive integer greater than or equal to 2, the second current times is the number of times that the second type of fault occurs in the current period, the current total times is the total number of times that the types of faults occur in the current period, the N second history times includes the number of times that the second type of fault occurs in each period in the N periods in the memory, and the N total history times includes the total number of times that the types of faults occur in each period in the N periods in the memory.
In an exemplary embodiment, the determining, according to the N second historical times and the second current times, a predicted weighted cumulative number of times that the second type of fault occurs in the next period includes:
determining N historical weighted accumulation times according to the N second historical times corresponding to the N periods and N corresponding predetermined first historical weights;
determining the current weighting accumulated times according to the second current times corresponding to the current period and the corresponding predetermined first current weight;
determining a third fitting curve under a target coordinate system according to the N historical weighted accumulation times and the current weighted accumulation times, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted accumulation times of the second type of fault occurring in the memory, and the third fitting curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate corresponding to the next period on the third fitting curve as the prediction weighted accumulation frequency.
In an exemplary embodiment, the determining, according to the N second historical times, the N total historical times, the second current time, and the total current time, a predicted weighted cumulative probability that the second type of fault occurs in the next period in the memory includes:
determining N historical weighted cumulative probabilities according to N second historical probabilities corresponding to the N periods and N predetermined second historical weights, wherein the N second historical probabilities are ratios of each second historical time of the N second historical times to corresponding historical total times of the N historical total times;
determining a current weighted cumulative probability according to the N second historical probabilities, the N second historical weights, a second current probability corresponding to the current period and a corresponding predetermined second current weight, wherein the second current probability is a ratio of the second current times to the current total times;
determining a fourth fitted curve under a target coordinate system according to the N historical weighted cumulative probabilities and the current weighted cumulative probability, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted cumulative probability of the second type of fault occurring in the memory, and the fourth fitted curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate, corresponding to the next period, on the fourth fitting curve as the prediction weighted accumulation probability.
According to another embodiment of the present application, there is provided a memory failure prediction apparatus including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring memory fault information in a current period, and the memory fault information comprises the type of faults occurring in the current period in the memory of a server;
the first determining module is used for determining the frequency of the faults of each type in the current period of the memory according to the memory fault information;
a second determining module, configured to determine, according to the number of times that each type of fault occurs in the current period in the memory and the number of times that each type of fault occurs in the historical period in the memory, a prediction result that each type of fault occurs in the next period of the current period in the memory, where the historical period includes one or more periods before the current period.
According to yet another embodiment of the present application, there is further provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the steps in any of the above memory failure prediction method embodiments when executed.
According to another embodiment of the present application, there is also provided an electronic device, including a storage and a processor, where the storage stores a computer program, and the processor is configured to execute the computer program to perform the steps in any one of the above embodiments of the method for predicting a memory failure.
According to the method and the device, under the condition that the memory of the server fails, the memory fault information carrying the types of the faults of the memory of the server in the current period can be obtained, under the condition, the times of the faults of various types in the current period can be determined according to the memory fault information, then the prediction result of the faults of various types in the next period can be determined according to the times of the faults of various types in the current period and the times of the faults of various types in the historical period, the faults of the memory of the server in the next period can be automatically analyzed and determined according to the memory fault information of the faults in the current period of the memory of the server and the memory fault information of the faults in the historical period of the memory of the server, the faults of the memory of the server in the next period can be prevented from being analyzed and determined under the crash or crash of the server, and the timeliness of the fault conditions of the memory of the server can be greatly improved. Therefore, the problem that the timeliness for determining the fault condition of the server memory is low can be solved, and the effect of improving the timeliness for determining the fault condition of the server memory is achieved.
Drawings
FIG. 1 is a diagram of a network architecture according to an embodiment of the present application;
fig. 2 is an application scenario diagram of a memory failure prediction method according to an embodiment of the present application;
FIG. 3 is a flow diagram of prediction of memory failure according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a method for determining the number of times of various types of failures occurring in the memory of the server according to an embodiment of the present application;
FIG. 5 is a schematic illustration of determining a number of predictions according to an embodiment of the application;
FIG. 6 is a schematic diagram of a modified history fit curve according to an embodiment of the present application;
FIG. 7 is a schematic illustration of determining a prediction probability according to an embodiment of the present application;
FIG. 8 is a schematic diagram of determining a predictive weighted accumulation count according to an embodiment of the present application;
FIG. 9 is a schematic illustration of determining a predictive weighted cumulative probability according to an embodiment of the application;
fig. 10 is a schematic diagram of a method for predicting a memory failure according to an embodiment of the present disclosure;
fig. 11 is a block diagram illustrating a memory failure prediction apparatus according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Fig. 1 is a diagram of a network architecture according to an embodiment of the present application, where the embodiment of the present application may operate on the network architecture shown in fig. 1, and as shown in fig. 1, the network architecture may include, but is not limited to: server 102, server 104, and server 106.
The application scenario of the memory failure prediction method according to the embodiment of the present application may be explained, but not limited to, by taking the server 102 as an example. Fig. 2 is an application scenario diagram of a Memory failure prediction method according to an embodiment of the present application, as shown in fig. 2, a server 102 may be disposed with, but not limited to, a Memory 104, a bios (Basic Input Output System) 106, a bmc 108, and an SRAM (Static Random-Access Memory) 110, and the server 102 may be used for executing, but not limited to, a serviceThe line provides support, during the service execution, the BIOS 106 may detect the memory 104, and when the BIOS 106 detects that the memory 104 fails, the BIOS 106 may store, but is not limited to storing, the memory failure information into the SRAM 110, where the memory failure information may, but is not limited to, carrying a time (t) when the memory 104 fails (t) 1 ,t 2 ,……,t N ) And information such as an identifier (memory failure 1, memory failure 2, \8230;, memory failure N) and a type (which may be, but is not limited to, UCE (Uncorrectable Error) or CE (Correctable Error)), the BMC 108 may read and analyze the memory failure information stored in the SRAM 110.
In this embodiment, a method for predicting a memory failure of each server operating in the network architecture is provided, and fig. 3 is a flowchart of memory failure prediction according to an embodiment of the present application, and as shown in fig. 3, the flowchart includes the following steps:
step S302, obtaining memory fault information in a current period, wherein the memory fault information comprises the type of faults occurring in the current period in the memory of the server;
step S304, determining the frequency of the memory in which various types of faults occur in the current period according to the memory fault information;
step S306, determining a prediction result of the occurrence of each type of fault in the next period of the current period according to the number of times of occurrence of each type of fault in the current period and the number of times of occurrence of each type of fault in the historical period, where the historical period includes one or more periods before the current period.
Through the steps, under the condition that the memory of the server fails, the memory fault information carrying the types of the faults of the memory of the server in the current period can be obtained, under the condition, the times of the faults of various types in the current period can be determined according to the memory fault information, then the prediction result of the faults of various types in the next period can be determined according to the times of the faults of various types in the current period and the times of the faults of various types in the historical period, the faults of the memory of the server in the next period can be automatically analyzed and determined according to the memory fault information of the faults in the current period of the memory of the server and the memory fault information of the faults in the historical period of the memory of the server, the faults of the memory of the server in the next period can be prevented from being analyzed and determined, and the timeliness of the fault conditions of the memory of the server can be greatly improved. Therefore, the problem that the timeliness for determining the fault condition of the server memory is low can be solved, and the effect of improving the timeliness for determining the fault condition of the server memory is achieved.
The execution subject of the above steps may be a server, etc., but is not limited thereto.
The execution sequence of step S302 and step S304 may be interchanged, that is, step S304 may be executed first, and then step S302 may be executed.
In the technical solution provided in step S302, when the memory of the server fails, the memory failure information of the memory of the server at each time in different periods may be, but is not limited to, obtained according to different periods, or the memory failure information of the memory of the server at each time may be, but is not limited to, obtained, and then the memory failure information at each time is divided into different periods according to the occurrence time of the failure.
Optionally, in this embodiment, the memory failure information may be but is not limited to carry a failure type of a failure occurring in the memory of the server, and the failure type of the failure occurring in the memory of the server may be but is not limited to include a UCE type and a CE type, and the like. The UCE type fault is an unrepairable fault, which directly causes a server to be down, and brings huge economic and business losses. For a CE type failure, the hardware of the server may repair the CE type failure using part of the resources, but if the CE type failure is accumulated too much, it may not be repaired, and a UCE type failure may occur.
Optionally, in this embodiment, the memory fault information may include, but is not limited to, detailed information such as a time or a period of a fault occurring in the memory of the server, a hardware of the fault occurring, a fault occurring and a type identifier of the fault occurring, and the like, and accuracy of locating the fault occurring in the memory of the server according to the memory fault information is improved.
In the technical solution provided in step S304, when the memory of the server fails, the BMC of the server may, but is not limited to, obtain the memory failure information in the current period, and in such a case, determine, according to the memory failure information, the number of times that each type of failure occurs in the current period. Fig. 4 is a schematic diagram illustrating a method for determining the number of times that each type of failure occurs in a memory of a server according to an embodiment of the present application, as shown in fig. 4, the memory failure information may include, but is not limited to, a time (t) when the memory of the server fails 1 ,t 2 ,……,t N ) The method comprises the steps of detecting the failure of the server, identifying the failure of the server memory (damaged memory 1, insufficient memory space of memory 2, leaked memory 3, 82308230; leaked memory N), identifying the type of the server memory failure (UCE, CE, CE, 8230; CE), and the like. The memory failure information can be but not limited to be input into the BMC, and the BMC can but not be limited to analyze and count the memory failure information, and output the number of times of the UCE type failure, the number of times of the CE type failure, and the total number of times of the failure of each type failure in different periods of the memory of the server.
In detail, BMC may but is not limited to assign t 1 And t 2 Division into periods T 1 Will t 3 ,……,t m Division into periods T 2 Will t m+1 ,……,t N Divided into periods T 3 . In the period T 1 In the method, the memory of the server can be but is not limited to have UCE type failure for 3 times, CE type failure for 3 times, and various types of failures for 6 times; in the period T 2 In, the memory of the server can but not limited to occurThe UCE type fails 5 times, the CE type fails 5 times, and various types of failures occur 10 times in total; in the period T 3 The memory of the server may be, but not limited to, the UCE type failure occurs 7 times, the CE type failure occurs 13 times, and various types of failures occur 20 times.
Optionally, in this embodiment, under the condition that the memory of the server fails, the memory failure information in the current period may be obtained, but is not limited to, under such a condition, the memory failure information may be counted according to the type of the memory failure, so that the number of times that each type of failure occurs in the current period exists may be determined, thereby avoiding counting the specific failures that occur in the current period in the memory of the server one by one, and greatly improving the efficiency of counting the failures that occur in the memory of the server.
In the technical solution provided in step S306, when the memory of the server fails, the number of times that each type of failure occurs in the current period may be determined according to, but is not limited to, the memory failure information of the memory of the server in the current period, and in such a case, the failure condition of the memory of the server in the next period may be automatically analyzed and determined according to, but is not limited to, the memory failure information of the memory of the server in the current period and the memory failure information of the memory of the server in the history period. By the method, on one hand, the problem that the fault of the memory of the server is analyzed only when the server is down or crashed is avoided, and the timeliness of determining the fault condition of the memory of the server is greatly improved; on the other hand, the method can automatically find the possible faults in the next period in advance, and greatly reduces the service loss caused by the faults in the memory of the server.
Optionally, in this embodiment, the history period may include, but is not limited to, a partial period or all periods before the current period, and the partial period may be, but is not limited to, a period within a certain time range from the time of the current period. The method has the advantages that the resource consumption is low, and the time span between the historical period and the current period can well determine the prediction result which can well reflect the influence of the fault condition of the historical period which is closer to the current period and the fault condition of the current period on the next period. The prediction result is determined according to the number of times of the faults of each type in the current period and the number of times of the faults of each type in all periods before the current period, and the prediction result comprehensively reflected by the data according to the mode can comprehensively reflect the influence of the fault condition of each historical period and the fault condition of the current period on the next period.
Optionally, in this embodiment, after determining that the prediction result of each type of fault occurs in the next cycle of the current cycle in the memory of the server, the method further includes: under the condition that the prediction result of the UCE type fault in the next period of the current period exists in the memory of the server, operation and maintenance personnel of the server can be reminded, and service data stored in the memory of the server can be migrated to the memory of the standby server, so that the service loss and the data loss caused by the UCE type fault can be greatly reduced. Under the condition that the prediction result that the CE type faults occur in the next period of the current period in the internal memory of the server is determined, although the faults of the CE types may not directly cause the server to be down, the server may also be down along with the continuous accumulation of the faults of the CE types, in such a condition, operation and maintenance personnel of the server can be reminded to replace the internal memory of the server in advance and transfer data, so that the loss caused by the faults of the CE types is greatly reduced, and meanwhile, the stability of the server is greatly improved.
In one exemplary embodiment, the determination of the existence of the predicted outcome of each type of fault occurring in the next cycle of the current cycle may include, but is not limited to, one of the following:
the first situation is as follows: and under the condition that each type comprises a first type and the history period comprises N periods before the current period, determining the predicted number of times of the first type of faults occurring in the next period according to N first history times and a first current time, wherein N is 1 or N is a positive integer greater than or equal to 2, the first current time is the number of times of the first type of faults occurring in the current period, and the N first history times comprise the number of times of the first type of faults occurring in each period of the N periods.
Optionally, in this embodiment, when a failure of the first type (which may be but is not limited to a UCE type) occurs in the memory of the server, the predicted number of times that the failure of the first type occurs in the next cycle in the memory may be determined according to, but is not limited to, the number of times that the failure of the first type occurs in the current cycle in the memory and the number of times that the failure of the first type occurs in each cycle of N cycles in the memory. By the mode, the determined prediction times can be convenient for operation and maintenance personnel of the server to visually and clearly know the times of faults possibly existing in the next period, and the understandability of the prediction result is greatly improved.
The second situation: determining a predicted probability that the fault of the first type occurs in the next cycle in the memory according to the N first historical times, N total historical times, the first current times and the current total times when the types include the first type and the historical cycles include the N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the first current times is the number of times that the fault of the first type occurs in the current cycle, the current total times is the total number of times that the fault of the respective type occurs in the current cycle, the N first historical times includes the number of times that the fault of the first type occurs in each of the N cycles in the memory, and the N total historical times includes the total number of times that the fault of the respective type occurs in each of the N cycles in the memory.
Optionally, in this embodiment, when a failure of a first type (which may be but is not limited to a UCE type) occurs in a memory of the server, and a prediction probability of the occurrence of the failure of the first type in a next period of the memory is determined according to the N first historical times, the N total historical times, the first current times, and the current total times, the operation and maintenance staff of the server may be but is not limited to directly prompt the operation and maintenance staff of the server about the prediction probability, or the operation and maintenance staff may set a probability prompting threshold of the prediction probability, and when the prediction probability is greater than or equal to the probability prompting threshold, it may be shown that the probability of the occurrence of the failure of the UCE type in the next period of the memory is very high, and in such a case, the operation and maintenance staff may be prompted; under the condition that the prediction probability is smaller than the probability reminding threshold value, the fact that the possibility that UCE type faults occur in the next period exists in the memory is low can be shown, and in such a condition, operation and maintenance personnel are not prompted. Through the mode, on the one hand, the determined prediction probability can be convenient for operation and maintenance personnel of the server to directly and clearly know the possibility of the first type of fault possibly occurring in the next period of the memory, on the other hand, the condition that the operation and maintenance personnel need to be prompted is effectively screened and selected according to the probability reminding threshold value set by the operation and maintenance personnel, and the efficiency of prompting the operation and maintenance personnel is improved.
In one exemplary embodiment, the predicted number of times that the first type of fault occurs in the next cycle may be determined, but is not limited to, by: determining a first fitting curve in a target coordinate system according to the N first historical times corresponding to the N cycles and the first current times corresponding to the current cycle, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the times of the first type of fault occurring in the memory, and the first fitting curve at least comprises points corresponding to the N cycles, the current cycle and the next cycle respectively; and determining the value of the point on the ordinate corresponding to the next period on the first fitting curve as the prediction frequency.
Optionally, in this embodiment, the prediction times may be determined according to, but not limited to, the first fitted curve, and may be determined according to, but not limited to, the history period T as shown in table 1 1 Number of times of UCE type of fault occurred in, and history period T 2 Number of UCE type failures, and current period T 3 The number of times of UCE type faults occur in the system is determined, and the next period T is determined 4 A predicted number of times a UCE type failure occurs within.
TABLE 1
Period of time Number of occurrences of UCE type failure
T 1 (i.e., the above-mentioned history period) 3 (i.e., the first history times mentioned above)
T 2 (i.e., the above-mentioned history period) 5 (i.e., the first history times mentioned above)
T 3 (i.e., the current period described above) 7 (i.e. the first current number of times mentioned above)
FIG. 5 is a schematic diagram for determining the prediction times according to an embodiment of the present application, and as shown in FIG. 5, the history period T may be, but is not limited to 1 Number of times of UCE type of fault occurred in, and history period T 2 Number of UCE type failures, and current period T 3 Internally-occurring UCE-type faultsThe number of times of the first type of fault in the memory is taken as an abscissa in the target coordinate system, and the number of times of the first type of fault in the memory is taken as an ordinate, and a first fitting curve is fitted, which may be, but is not limited to, fitting the next period T on the fitting curve 4 The value (which may be but is not limited to 9) of the corresponding point on the ordinate of the first fitted curve is determined as the prediction time.
In an exemplary embodiment, the historical fitted curve may be modified, but is not limited to, by obtaining a first fitted curve: and under the condition that a history fitting curve exists in the target coordinate system, correcting the history fitting curve according to the N first history times corresponding to the N periods and the first current time corresponding to the current period to obtain the first fitting curve, wherein the history fitting curve is a fitting curve determined according to at least part of the N first history times and the periods corresponding to the at least part of the N periods.
Optionally, in this embodiment, the history fitting curve may be modified by, but is not limited to, the following ways: determining a history fitting curve under a target coordinate system according to history times corresponding to part of the N history cycles, wherein N is 1 or N is a positive integer greater than or equal to 2, the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the times of the first type of fault occurring in the memory, and the history fitting curve at least comprises points corresponding to the part of the history cycles respectively; and correcting the line segments of the history fitting curve except the line segments corresponding to the part of the history periods by using the history times corresponding to the history periods except the part of the history periods in the N history periods and the first current times corresponding to the current period to obtain a first fitting curve.
Optionally, in this embodiment, the modifying the history fitting curve may include, but is not limited to, one of the following situations:
the first situation is as follows: but may be limited to by re-fitting the number of times that the first type of fault occurred within the history period and within the current period.
Case two: and keeping the line segment corresponding to the part of the history period on the history fitting curve, and re-fitting the line segment except the line segment corresponding to the part of the history period on the history fitting curve according to the times of the first type of faults in the N history periods except the part of the history period and the current period. FIG. 6 is a schematic diagram of a modified history fit curve according to an embodiment of the present application, as shown in FIG. 6, the history period may include, but is not limited to T 1 ,T 2 ,……,T m ,……,T N-1 The current period is T N . The presence of T can be predicted, but is not limited to N+1 The number of times a first type of fault occurs within a cycle. May be, but is not limited to, according to T 1 Period, T 2 ,……,T m And fitting a history fitting curve by taking the time as an abscissa and the number of times of the first type of faults in the memory as an ordinate under the target coordinate system. May be, but is not limited to, based on the memory presence period T m+1 Internal to period T N Correcting the historical fitting curve to obtain a first fitting curve according to the actual times of the internal occurrence of the first type of fault, and then according to the first fitting curve, internally existing the period T N+1 The number of times a first type of fault occurs is predicted. The history fitting curve is corrected through the number of times of the first type of faults actually occurring in the memory, and the rationality of the first fitting curve is improved.
In one exemplary embodiment, the predicted probability of the occurrence of the first type of fault within the next cycle may be determined, but is not limited to, by: determining a second fitted curve in a target coordinate system according to N first historical probabilities corresponding to the N cycles and the first current probability corresponding to the current cycle, wherein an abscissa of the target coordinate system represents time, an ordinate of the target coordinate system represents probability of the first type of fault occurring in the memory, the second fitted curve at least includes points corresponding to the N cycles, the current cycle and the next cycle, respectively, the N first historical probabilities are ratios of respective first historical times of the N first historical times to corresponding historical total times of the N historical total times, and the first current probability is a ratio of the first current times to the current total times; and determining the value of the point corresponding to the next period on the second fitting curve on the ordinate as the prediction probability.
Alternatively, in this embodiment, the prediction probability may be determined based on, but not limited to, the second fitted curve, and may be determined based on, but not limited to, the history period T as shown in table 2 1 History period T 2 And determining the prediction probability of the UCE type faults in the next period T4 according to the frequency of the UCE type faults and the total frequency of the various types of faults, the frequency of the UCE type faults in the current period T3 and the total frequency of the various types of faults.
TABLE 2
Figure BDA0003919437830000161
FIG. 7 is a schematic diagram of determining a prediction probability according to an embodiment of the present application, and as shown in FIG. 7, the history periods T may be, but are not limited to, determined separately 1 History period T 2 Number of internal UCE type faults and history period T 1 The ratio of the total number of times of occurring UCE type faults is taken as the first history probability (which may be but is not limited to the history period T respectively) 1 May be, but is not limited to, 0.5, history period T 2 May be, but is not limited to, 0.5) and determines the current period T 3 Number of times of internal occurrence of UCE type fault and current period T 3 Taking the ratio of the total number of times of the UCE type faults occurring in the memory as a first current probability (which can be but is not limited to 0.35), taking time as an abscissa in a target coordinate system and taking the probability of the UCE type faults occurring in the memory as an ordinate, fitting a second fitting curve, and fitting the next period T on the second fitting curve without limitation 4 The value of the corresponding point on the ordinate of the second fitted curve (may be but is not limited to being0.25 Is determined as a prediction probability.
In an exemplary embodiment, in the case that the respective types include the second type, determining that there is a predicted result of the respective types of faults occurring in the next cycle of the current cycle may include, but is not limited to, one of the following cases:
the first situation is as follows: and under the condition that each type comprises a second type and the history period comprises N periods before the current period, determining the predicted weighted cumulative number of times of the second type of faults occurring in the next period according to N second history numbers and a second current number, wherein N is 1 or is a positive integer greater than or equal to 2, the second current number is the number of times of the second type of faults occurring in the current period, and the N second history numbers comprise the number of times of the second type of faults occurring in each period of the N periods.
Optionally, in this embodiment, when a fault of a second type (which may be but is not limited to a CE type) occurs in the memory of the server, the weighted cumulative number of times that the fault of the second type occurs in the next period in the memory may be determined according to the number of times that the fault of the second type occurs in the current period in the memory and the number of times that the fault of the second type occurs in each period of the N periods in the memory. Through the mode, the determined weighted accumulation times can reflect the influence of the second type of faults generated by the server memories in the historical period and the current period on the superposition accumulation of the server memories, and the rationality of the weighted accumulation times is improved.
Case two: and under the condition that the types include the second type and the history period includes N periods before the current period, determining a predicted weighted cumulative probability that the second type of fault occurs in the next period in the memory according to the N second history times, N total history times, the second current times and the current total times, where N is 1 or N is a positive integer greater than or equal to 2, the second current times is the number of times that the second type of fault occurs in the current period, the current total times is the total number of times that the types of faults occur in the current period, the N second history times includes the number of times that the second type of fault occurs in each period in the N periods in the memory, and the N total history times includes the total number of times that the types of faults occur in each period in the N periods in the memory.
Optionally, in this embodiment, when a second type (which may be but is not limited to a CE type) of fault occurs in the memory of the server and the predicted weighted cumulative probability of the second type of fault occurring in the next period in the memory is determined, the operation and maintenance staff of the server may be but is not limited to be directly prompted of the weighted cumulative probability, or the operation and maintenance staff may set a cumulative probability threshold of the weighted cumulative probability, and when the weighted cumulative probability is greater than or equal to the cumulative probability threshold, it may be indicated that the probability of the CE type of fault occurring in the next period in the memory is very high, and in such a case, the operation and maintenance staff may be prompted; and under the condition that the weighted cumulative probability is smaller than the cumulative probability threshold value, the probability that the CE type fault occurs in the next period is low, and in such a condition, operation and maintenance personnel are not prompted. Through the mode, on one hand, the determined weighted cumulative probability can be convenient for operation and maintenance personnel of the server to directly and clearly know the possibility of the second type of fault possibly occurring in the next period of the memory, on the other hand, the condition of the operation and maintenance personnel needing to be prompted can be effectively screened and selected according to the cumulative probability threshold value set by the operation and maintenance personnel, and the efficiency of prompting the operation and maintenance personnel is improved.
In one exemplary embodiment, the predicted weighted cumulative number of occurrences of the second type of fault in the next cycle may be determined by, but is not limited to: determining N historical weighted accumulation times according to the N second historical times corresponding to the N periods and N corresponding predetermined first historical weights; determining the current weighting accumulated times according to the second current times corresponding to the current period and the corresponding predetermined first current weight; determining a third fitting curve under a target coordinate system according to the N historical weighted accumulation times and the current weighted accumulation times, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted accumulation times of the second type of fault occurring in the memory, and the third fitting curve at least comprises points corresponding to the N periods, the current period and the next period respectively; and determining the value of the point on the ordinate corresponding to the next period on the third fitting curve as the prediction weighted accumulation frequency.
Optionally, in this embodiment, the first current weight may be determined by, but is not limited to, the following manner: the first current weight may be, but is not limited to, being associated with N first historical weights, and the first historical weight may be, but is not limited to, being related to a number of times the second type of fault occurred within each of the N historical periods and a total number of times the second type of fault occurred within each of the N historical periods. For better understanding, the process of determining the first historical weight and the first current weight may be, but is not limited to, storing the historical period T in the server 4 The number of times the second type of fault occurs is divided by the time at which each fault occurs, and may be divided into, but is not limited to, 8 times in the morning (4 to 12) when the second type of fault occurs, 4 times in the afternoon (12 to 20) when the second type of fault occurs, and 2 times in the evening (20; and there is a history period T in 4 The total number of occurrences of each type of fault is 25, in which case, but not limited to, the ratio (0.32) of the number of occurrences (8) of CE type faults to the total number of occurrences (25) of each type of faults in the morning may be determined as the first history weight, the ratio (0.16) of the number of occurrences (4) of CE type faults to the total number of occurrences (25) of each type of faults in the afternoon may be determined as the first history weight, and the ratio (0.08) of the number of occurrences (2) of CE type faults to the total number of occurrences (25) of each type of faults in the afternoon may be determined as the first history weightWeights (0.32, 0.16, and 0.08) may be used, but are not limited to, as a means for determining that there is a current period T within 5 Weight of the number of times a CE type failure occurred in the morning or afternoon or evening.
Obtaining the current period T 5 In the case of the internal memory failure information, there may be divided, but not limited to, into 6 times in the morning (4 to 12; and the current period T exists therein 5 The total number of occurrences of each type of fault within is 30, and may, but is not limited to, be within the history period T 4 Number of CE type failures occurred in the morning (8) and at the current period T 5 Sum of number of CE type failures occurred in the morning (6) (14) and historical period T 4 The total number of occurrences of each type of fault (25) and the current period T 5 The ratio (14/55 = 0.26) of the sum (55) of the total number of times (30) that each type of fault occurred is determined as the first current weight, which may be, but is not limited to, the sum of the number of times that each type of fault occurred in the history period T 4 Number of CE type failures occurred in the interior afternoon (4) and at the current period T 5 Sum of the number of CE-type failures in the afternoon (2) (6) and the history period T 4 The total number of times (25) each type of fault occurs in the current period T 5 The ratio (6/55 = 0.11) of the sum (55) of the total number of times (30) that each type of fault occurred within is determined as the first current weight, which may be, but is not limited to be, the sum of the number of times that each type of fault occurred within the history period T 4 Number of CE type failures occurred in the evening (2) and at the current period T 5 The sum of the number of occurrences of CE-type failures in the evening (1) (3) and the historical period T 4 The total number of times (25) each type of fault occurs in the current period T 5 The ratio (3/55 = 0.05) of the sum (55) of the total number (30) of occurrences of the respective type of fault is determined as the first current weight, and these weights (0.26, 0.11, and 0.05) may be used, but are not limited to, as a means for determining the presence of the next cycle T 6 The number of times a CE type failure occurs in the morning or afternoon or evening.
In the same way, under the condition of obtainingOne period T 6 In the case of the memory failure information of (1), it may be divided into, but not limited to, that the second type of failure occurs 3 times in the morning (4 to 12), the second type of failure occurs 5 times in the afternoon (12 to 20); and the current period T exists therein 5 The total number of times of occurrence of each type of fault is 40 times, and the number of times of occurrence of each type of fault can be calculated for determining the existence period T according to the method described above 6 Next period T of 7 The number of occurrences of CE type failures in the morning or afternoon or evening is weighted (0.18, 0.12, 0.18). Will be the above period T 4 To period T 6 The number of CE type failures occurring in the morning, the number of CE type failures occurring in the afternoon, the number of CE type failures occurring in the evening, and the weight statistics are shown in table 3 below.
TABLE 3
Figure BDA0003919437830000201
Alternatively, in this embodiment, the predicted weighted accumulation number may be determined according to, but not limited to, the third fitting curve, and the process of determining the predicted weighted accumulation number may be explained and illustrated with the contents included in table 4. As shown in Table 4, T 1 As an initial period, the current period is T 3 Then period T 1 Has a weighted cumulative number of times of 3 and a period T 2 The weighted cumulative number of times of 3+5+ 0.4=5 times, and the period T 3 The weighted cumulative number of times of (3 +5+ 0.4+13 + 0.8) =15 times. May be, but is not limited to, according to a period T 1 To period T 2 Corresponding history weighted accumulation times and period T 3 And determining a third fitting curve under the target coordinate system according to the corresponding current weighted accumulation times. FIG. 8 is a schematic diagram illustrating a method for determining a predicted weighted accumulation number according to an embodiment of the present application, where, as shown in FIG. 8, an abscissa of a target coordinate system represents time, and an ordinate of the target coordinate system represents a weighted accumulation number of occurrences of the second type of fault in the memory; the third fit curve can be, but is not limited to, the next week up and downPeriod T 4 The value (18) of the corresponding point on the ordinate is determined as the prediction weighted accumulation number.
TABLE 4
Figure BDA0003919437830000211
In one exemplary embodiment, the predicted weighted cumulative probability of the occurrence of the second type of fault in the next cycle may be determined by, but is not limited to: determining N historical weighted cumulative probabilities according to N second historical probabilities corresponding to the N periods and N predetermined second historical weights, wherein the N second historical probabilities are ratios of each second historical time in the N second historical times to corresponding historical total times in the N historical total times; determining a current weighted cumulative probability according to the N second historical probabilities, the N second historical weights, a second current probability corresponding to the current period and a corresponding predetermined second current weight, wherein the second current probability is the ratio of the second current times to the current total times; determining a fourth fitted curve under a target coordinate system according to the N historical weighted cumulative probabilities and the current weighted cumulative probability, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted cumulative probability of the second type of fault occurring in the memory, and the fourth fitted curve at least comprises points corresponding to the N periods, the current period and the next period respectively; and determining the value of the point on the ordinate, corresponding to the next period, on the fourth fitting curve as the prediction weighted accumulation probability.
Alternatively, in this embodiment, the prediction weighted accumulation probability may be determined by, but not limited to, a fourth fitting curve, and the process of determining the prediction weighted accumulation times may be explained and illustrated by, but not limited to, the contents included in table 5. As shown in Table 5, T 1 As a start cycle, the current cycle is T 3 Then period T 1 Has a weighted cumulative probability of 3/6 x 0.2=0.1, period T 2 The weighted cumulative probability of (3/6 × 0.2+5/10 × 0.3) (3 + 5)/(6 + 10) =0.125, the period T 3 The weighted cumulative probability of (3/6 + 0.2+5/10 + 0.3+13/20 + 0.5) = (3 +5+ 13)/(6 +10+ 20) =0.5.
Can be but is not limited to using the period T 1 Period T 2 And period T 3 Determining a fourth fitting curve on the target coordinate axis according to the corresponding weighted cumulative probability, and fig. 9 is a schematic diagram of determining a predicted weighted cumulative probability according to an embodiment of the present application, where as shown in fig. 9, an abscissa of a target coordinate system represents time, an ordinate of the target coordinate system represents a weighted cumulative probability of a second type of fault occurring in the memory, and a period T is defined by a value on the fourth fitting curve 4 And the value (0.375) of the corresponding point on the ordinate is determined as the prediction weighted cumulative probability.
TABLE 5
Figure BDA0003919437830000221
Figure BDA0003919437830000231
In order to better understand the memory failure prediction method, the prediction process of the memory failure may be explained, but not limited to, by combining with an optional embodiment, and may be used in the embodiments of the present application.
Fig. 10 is a schematic diagram of a method for predicting a memory fault according to an embodiment of the present application, and as shown in fig. 10, a BMC may, but is not limited to, interact with a BIOS using a related structural body, and the BMC may, but is not limited to, record memory fault information reported to the SRAM by the BIOS for obtaining; the Memory Handle Controller in the BMC may, but is not limited to, analyze Memory failure information reported by the BIOS, and determine the type of the failure occurring in the Memory of the server.
Under the condition that the CE or UCE is determined to be wrong, the memory processing controller marks, stores data into a flash memory, records a memory silk screen (which can be but is not limited to manufacturers, models, SN (Serial Number, serial Number) and the like), specially marks and pays attention to fault points with UCE errors, and prompts a user to pay attention to the fault points. When unmarked, no further care is taken (e.g., to replace a memory stick), and the user is also required to be marked and prompted with a focus if this type of memory stick is replaced again.
The memory processing controller may perform memory analysis by combining the reported memory failure information and the comprehensive information such as MCE (Machine Check Exception) and SEL (System Error Log), and may fit a fitting curve of the CE type failure or the UCE type failure, and determine a prediction result of the CE type failure or the UCE type failure occurring in the next period in the memory of the server according to the fitting curve, for example: the failure rate of some models of memory of which manufacturers is higher, which memory channels are easy to have CE failure, etc., and a possible failure rate of memory is required to be given, and the threshold value with the highest failure is adjusted according to the count of CE and user configuration.
The memory processing controller reorganizes the analysis result, reports the reorganization result to a front-end Web (World Wide Web, global Wide area network) interface, and draws and displays the prediction result for decision-making by operation and maintenance personnel of the server.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
In this embodiment, a prediction apparatus for memory failure is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 11 is a block diagram illustrating a structure of a memory failure prediction apparatus according to an embodiment of the present application, where as shown in fig. 11, the apparatus includes:
an obtaining module 1102, configured to obtain memory fault information in a current period, where the memory fault information includes a type of a fault occurring in the current period in a memory of a server;
a first determining module 1104, configured to determine, according to the memory fault information, the number of times that each type of fault occurs in the current period in the memory;
a second determining module 1106, configured to determine, according to the number of times that the various types of faults occur in the current period in the memory and the number of times that the various types of faults occur in the history period in the memory, that the predicted result of the various types of faults occur in the next period of the current period in the memory, where the history period includes one or more periods before the current period.
Optionally, the second determining module includes:
a first determining unit, configured to determine, when the respective types include a first type and the history period includes N periods before the current period, a predicted number of times that the fault of the first type occurs in the next period according to N first history times and a first current time, where N is 1 or N is a positive integer greater than or equal to 2, the first current time is a number of times that the fault of the first type occurs in the current period, and the N first history times include a number of times that the fault of the first type occurs in each of the N periods; or
A second determining unit, configured to determine, when the respective types include the first type and the history period includes N periods before the current period, a predicted probability that the fault of the first type occurs in the next period according to the N first history times, N total history times, the first current times, and the current total times, where N is 1 or N is a positive integer greater than or equal to 2, the first current times is a number of times that the fault of the first type occurs in the current period, the current total times is a total number of times that the fault of the respective types occurs in the current period, the N first history times includes a number of times that the fault of the first type occurs in each of the N periods, and the N total history times includes a total number of times that the fault of the respective types occurs in each of the N periods.
Optionally, the first determining unit is configured to:
determining a first fitting curve under a target coordinate system according to the N first historical times corresponding to the N cycles and the first current times corresponding to the current cycle, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the times of the first type of fault occurring in the memory, and the first fitting curve at least comprises points corresponding to the N cycles, the current cycle and the next cycle respectively;
and determining the value of the point corresponding to the next period on the first fitting curve on the ordinate as the prediction times.
Optionally, the first determining unit is further configured to:
and under the condition that a history fitting curve exists in the target coordinate system, correcting the history fitting curve according to the N first history times corresponding to the N periods and the first current time corresponding to the current period to obtain the first fitting curve, wherein the history fitting curve is a fitting curve determined according to at least part of the N first history times and the periods corresponding to the at least part of the N periods.
Optionally, the second determining unit is configured to:
determining a second fitting curve under a target coordinate system according to N first historical probabilities corresponding to the N periods and the first current probability corresponding to the current period, wherein an abscissa of the target coordinate system represents time, an ordinate of the target coordinate system represents probability of the first type of fault occurring in the memory, the second fitting curve at least includes points corresponding to the N periods, the current period and the next period respectively, the N first historical probabilities are ratios of each of the N first historical times to a corresponding historical total number of the N historical total times, and the first current probability is a ratio of the first current time to the current total number of times;
and determining the value of the point corresponding to the next period on the second fitting curve on the ordinate as the prediction probability.
Optionally, the second determining module includes:
a third determining unit, configured to determine, according to N second historical times and a second current time, a predicted weighted cumulative number of times that a fault of the second type occurs in the next cycle in the memory when the respective types include a second type and the historical cycle includes N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the second current time is a number of times that a fault of the second type occurs in the current cycle in the memory, and the N second historical times includes a number of times that a fault of the second type occurs in each of the N cycles in the memory; or
A fourth determining unit, configured to determine, according to the N second historical times, N total historical times, the second current times, and the current total times, a predicted weighted cumulative probability that the second type of fault occurs in the next cycle when the respective types include the second type, and the historical cycle includes the N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the second current times is a number of times that the second type of fault occurs in the current cycle, the current total times is a total number of times that the respective types of faults occur in the current cycle, the N second historical times includes a number of times that the second type of fault occurs in each of the N cycles, and the N total historical times includes a total number of times that the respective types of faults occur in each of the N cycles.
Optionally, the third determining unit is configured to:
determining N historical weighted accumulation times according to the N second historical times corresponding to the N periods and N corresponding predetermined first historical weights;
determining the current weighting accumulated times according to the second current times corresponding to the current period and the corresponding predetermined first current weight;
determining a third fitting curve under a target coordinate system according to the N historical weighted accumulation times and the current weighted accumulation times, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted accumulation times of the second type of fault occurring in the memory, and the third fitting curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate, corresponding to the next period, on the third fitting curve as the prediction weighted accumulation frequency.
Optionally, the fourth determining unit is configured to:
determining N historical weighted cumulative probabilities according to N second historical probabilities corresponding to the N periods and N predetermined second historical weights, wherein the N second historical probabilities are ratios of each second historical time of the N second historical times to corresponding historical total times of the N historical total times;
determining a current weighted cumulative probability according to the N second historical probabilities, the N second historical weights, a second current probability corresponding to the current period and a corresponding predetermined second current weight, wherein the second current probability is the ratio of the second current times to the current total times;
determining a fourth fitted curve under a target coordinate system according to the N historical weighted cumulative probabilities and the current weighted cumulative probability, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted cumulative probability of the second type of fault occurring in the memory, and the fourth fitted curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate corresponding to the next period on the fourth fitting curve as the prediction weighted accumulation probability.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are located in different processors in any combination.
Embodiments of the present application further provide a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps in any of the above method embodiments when executed.
In an exemplary embodiment, the computer readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present application further provide an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
In an exemplary embodiment, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementations, and details of this embodiment are not repeated herein.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized in a single computing device or distributed across a network of multiple computing devices, and they may be implemented by program code executable by a computing device, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be executed in an order different from that described herein, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps thereof may be fabricated as a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims (11)

1. A method for predicting memory failure, comprising:
acquiring memory fault information in a current period, wherein the memory fault information comprises the type of faults occurring in the current period in a memory of a server;
determining the frequency of the occurrence of each type of fault in the current period of the memory according to the memory fault information;
and determining a prediction result of the faults of each type occurring in the next period of the current period according to the number of times of the faults of each type occurring in the current period and the number of times of the faults of each type occurring in the historical period, wherein the historical period comprises one or more periods before the current period.
2. The method according to claim 1, wherein the determining that the predicted result of the occurrence of each type of fault in the next cycle of the current cycle exists according to the number of times that each type of fault occurs in the current cycle and the number of times that each type of fault occurs in the historical cycle includes:
under the condition that each type comprises a first type and the history period comprises N periods before the current period, determining the predicted number of times of the first type of faults occurring in the next period according to N first history times and a first current time, wherein N is 1 or N is a positive integer greater than or equal to 2, the first current time is the number of times of the first type of faults occurring in the current period, and the N first history times comprise the number of times of the first type of faults occurring in each period of the N periods; or
Determining a predicted probability that the fault of the first type occurs in the next cycle in the memory according to the N first historical times, N total historical times, the first current times and the current total times when the types include the first type and the historical cycles include the N cycles before the current cycle, where N is 1 or N is a positive integer greater than or equal to 2, the first current times is the number of times that the fault of the first type occurs in the current cycle, the current total times is the total number of times that the fault of the respective type occurs in the current cycle, the N first historical times includes the number of times that the fault of the first type occurs in each of the N cycles in the memory, and the N total historical times includes the total number of times that the fault of the respective type occurs in each of the N cycles in the memory.
3. The method according to claim 2, wherein the determining the predicted number of times that the first type of fault occurs in the next cycle in the memory according to the N first historical times and the first current times comprises:
determining a first fitting curve in a target coordinate system according to the N first historical times corresponding to the N cycles and the first current times corresponding to the current cycle, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the times of the first type of fault occurring in the memory, and the first fitting curve at least comprises points corresponding to the N cycles, the current cycle and the next cycle respectively;
and determining the value of the point on the ordinate corresponding to the next period on the first fitting curve as the prediction frequency.
4. The method according to claim 3, wherein said determining a first fitted curve in a target coordinate system according to the N first historical times corresponding to the N cycles and the first current time corresponding to the current cycle comprises:
and under the condition that a history fitting curve exists in the target coordinate system, correcting the history fitting curve according to the N first history times corresponding to the N periods and the first current time corresponding to the current period to obtain the first fitting curve, wherein the history fitting curve is a fitting curve determined according to at least part of the N first history times and the periods corresponding to the at least part of the N periods.
5. The method of claim 2, wherein said determining the predicted probability of the occurrence of the first type of fault in the next cycle based on the N first historical times, the N total historical times, the first current time, and the current total time comprises:
determining a second fitting curve under a target coordinate system according to N first historical probabilities corresponding to the N periods and the first current probability corresponding to the current period, wherein an abscissa of the target coordinate system represents time, an ordinate of the target coordinate system represents probability of the first type of fault occurring in the memory, the second fitting curve at least includes points corresponding to the N periods, the current period and the next period respectively, the N first historical probabilities are ratios of each of the N first historical times to a corresponding historical total number of the N historical total times, and the first current probability is a ratio of the first current time to the current total number of times;
and determining the value of the point corresponding to the next period on the second fitting curve on the ordinate as the prediction probability.
6. The method according to claim 1, wherein the determining that the predicted result of the occurrence of each type of fault in the next cycle of the current cycle exists according to the number of times that each type of fault occurs in the current cycle and the number of times that each type of fault occurs in the historical cycle includes:
under the condition that each type comprises a second type and the history period comprises N periods before the current period, determining the predicted weighted cumulative number of times of the second type of faults occurring in the next period according to N second history numbers and a second current number, wherein N is 1 or N is a positive integer greater than or equal to 2, the second current number is the number of times of the second type of faults occurring in the current period, and the N second history numbers comprise the number of times of the second type of faults occurring in each period of the N periods; or
And under the condition that the types include the second type and the history period includes N periods before the current period, determining a predicted weighted cumulative probability that the second type of fault occurs in the next period in the memory according to the N second history times, N total history times, the second current times and the current total times, where N is 1 or N is a positive integer greater than or equal to 2, the second current times is the number of times that the second type of fault occurs in the current period, the current total times is the total number of times that the types of faults occur in the current period, the N second history times includes the number of times that the second type of fault occurs in each period in the N periods in the memory, and the N total history times includes the total number of times that the types of faults occur in each period in the N periods in the memory.
7. The method according to claim 6, wherein the determining the predicted weighted cumulative number of occurrences of the second type of fault in the next cycle based on the N second historical numbers and the second current number comprises:
determining N historical weighted accumulation times according to the N second historical times corresponding to the N periods and N corresponding predetermined first historical weights;
determining the current weighting accumulated times according to the second current times corresponding to the current period and the corresponding predetermined first current weight;
determining a third fitting curve under a target coordinate system according to the N historical weighted accumulation times and the current weighted accumulation times, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted accumulation times of the second type of fault occurring in the memory, and the third fitting curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate corresponding to the next period on the third fitting curve as the prediction weighted accumulation frequency.
8. The method of claim 6, wherein said determining a predicted weighted cumulative probability of said second type of fault occurring within said next period of said memory based on said N second historical times, N historical total times, said second current times, and said current total times comprises:
determining N historical weighted cumulative probabilities according to N second historical probabilities corresponding to the N periods and N predetermined second historical weights, wherein the N second historical probabilities are ratios of each second historical time of the N second historical times to corresponding historical total times of the N historical total times;
determining a current weighted cumulative probability according to the N second historical probabilities, the N second historical weights, a second current probability corresponding to the current period and a corresponding predetermined second current weight, wherein the second current probability is the ratio of the second current times to the current total times;
determining a fourth fitted curve under a target coordinate system according to the N historical weighted cumulative probabilities and the current weighted cumulative probability, wherein the abscissa of the target coordinate system represents time, the ordinate of the target coordinate system represents the weighted cumulative probability of the second type of fault occurring in the memory, and the fourth fitted curve at least comprises points corresponding to the N periods, the current period and the next period respectively;
and determining the value of the point on the ordinate, corresponding to the next period, on the fourth fitting curve as the prediction weighted accumulation probability.
9. A prediction apparatus for memory failure, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring memory fault information in a current period, and the memory fault information comprises the type of faults occurring in the current period in the memory of a server;
the first determining module is used for determining the frequency of the faults of each type in the current period of the memory according to the memory fault information;
a second determining module, configured to determine, according to the number of times that each type of fault occurs in the current period in the memory and the number of times that each type of fault occurs in the historical period in the memory, a prediction result that each type of fault occurs in the next period of the current period in the memory, where the historical period includes one or more periods before the current period.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method as claimed in any of claims 1 to 8 are implemented when the computer program is executed by the processor.
CN202211350534.8A 2022-10-31 2022-10-31 Memory fault prediction method and device, storage medium and electronic device Pending CN115686909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211350534.8A CN115686909A (en) 2022-10-31 2022-10-31 Memory fault prediction method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211350534.8A CN115686909A (en) 2022-10-31 2022-10-31 Memory fault prediction method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN115686909A true CN115686909A (en) 2023-02-03

Family

ID=85045615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211350534.8A Pending CN115686909A (en) 2022-10-31 2022-10-31 Memory fault prediction method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN115686909A (en)

Similar Documents

Publication Publication Date Title
EP3979079A1 (en) Memory fault handling method and apparatus, device and storage medium
US10965565B2 (en) Method and apparatus for monitoring bandwidth condition
US10664837B2 (en) Method and system for real-time, load-driven multidimensional and hierarchical classification of monitored transaction executions for visualization and analysis tasks like statistical anomaly detection
US8458530B2 (en) Continuous system health indicator for managing computer system alerts
WO2019006654A1 (en) Financial self-service equipment maintenance dispatch generation method, hand-held terminal and electronic device
US20110270770A1 (en) Customer problem escalation predictor
US20090158189A1 (en) Predictive monitoring dashboard
US20120072781A1 (en) Predictive incident management
US20140195860A1 (en) Early Detection Of Failing Computers
US8631280B2 (en) Method of measuring and diagnosing misbehaviors of software components and resources
CN106383760A (en) Computer fault management method and apparatus
US11310140B2 (en) Mitigating failure in request handling
CN113657715A (en) Root cause positioning method and system based on kernel density estimation calling chain
Bauer et al. Practical system reliability
US11567756B2 (en) Causality determination of upgrade regressions via comparisons of telemetry data
CN113590429A (en) Server fault diagnosis method and device and electronic equipment
CN112583610B (en) System state prediction method, system state prediction device, server and storage medium
CN109710443B (en) Data processing method, device, equipment and storage medium
CN114860487A (en) Memory fault identification method and memory fault isolation method
CN112506802A (en) Test data management method and system
CN115686909A (en) Memory fault prediction method and device, storage medium and electronic device
CN110457194A (en) Electronic equipment stability method for early warning, system, device, equipment and storage medium
CN114896128A (en) Application program performance testing method and device based on block chain
US20040249602A1 (en) Method of calculating device metrics
CN113409876A (en) Method and system for positioning fault hard disk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination