CN116932324A - Memory bank fault prediction method and device and electronic equipment - Google Patents

Memory bank fault prediction method and device and electronic equipment Download PDF

Info

Publication number
CN116932324A
CN116932324A CN202310918660.7A CN202310918660A CN116932324A CN 116932324 A CN116932324 A CN 116932324A CN 202310918660 A CN202310918660 A CN 202310918660A CN 116932324 A CN116932324 A CN 116932324A
Authority
CN
China
Prior art keywords
fault
information
result
preset
memory bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310918660.7A
Other languages
Chinese (zh)
Inventor
林文好
张益军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yousheng Bona Technology Co ltd
Original Assignee
Shenzhen Yousheng Bona Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yousheng Bona Technology Co ltd filed Critical Shenzhen Yousheng Bona Technology Co ltd
Priority to CN202310918660.7A priority Critical patent/CN116932324A/en
Publication of CN116932324A publication Critical patent/CN116932324A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The application provides a memory bank fault prediction method and device and electronic equipment, and relates to the technical field of data processing. In the method, a memory bank fault prediction method is applied to a server, and the method comprises the following steps: acquiring a monitoring data packet of a memory bank; inputting the monitoring data packet into a preset fault prediction model to obtain an output result; and generating fault prediction information based on the output result and combining with historical fault information, wherein the fault prediction information is used for indicating that potential faults exist in the memory bank. By implementing the technical scheme provided by the application, the prediction accuracy of the faults of the memory bank is conveniently improved.

Description

Memory bank fault prediction method and device and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for predicting a memory bank failure, and an electronic device.
Background
With the rapid development of internet services, the availability of infrastructure is also receiving increasing attention from the industry. However, hardware failure has been a ubiquitous phenomenon, and the loss due to hardware failure is often enormous.
Among the various components of the server, memory bank failures are the second most common type of hardware failure, in addition to hard disk failures. And the number of the memory banks of the server is large, and the most serious consequence caused by the memory bank faults is that the system is crashed and the server is down. Therefore, it is necessary to predict the potential failure of the memory bank. At present, in the related art, a peripheral tool is generally adopted to predict the faults of the memory bank, and because the peripheral tool has uncontrollable data detection risks, the detection data are deviated, so that the prediction accuracy of the faults of the memory bank is lower.
Therefore, there is an urgent need for a method, apparatus and electronic device for predicting a memory failure.
Disclosure of Invention
The application provides a memory bank fault prediction method, a memory bank fault prediction device and electronic equipment, which are convenient for improving the prediction accuracy of memory bank faults.
In a first aspect of the present application, a method for predicting a memory bank failure is provided, and the method is applied to a server, and includes: acquiring a monitoring data packet of a memory bank; inputting the monitoring data packet into a preset fault prediction model to obtain an output result; and generating fault prediction information based on the output result and combining with historical fault information, wherein the fault prediction information is used for indicating that potential faults exist in the memory bank.
By adopting the technical scheme, the server firstly acquires the monitoring data packet of the memory bank, and then inputs the monitoring data packet into the preset fault prediction model, so that an output result of the preset fault prediction model is obtained. Next, the server will combine the historical fault information based on the output results to generate the fault prediction information. Compared with the related art, the method reduces the probability of deviation of the detection data of the peripheral tool, predicts the faults of the memory bank according to the fault prediction information, and is convenient for improving the accuracy of the prediction of the faults of the memory bank.
Optionally, the monitoring data packet includes temperature data and voltage data, and the inputting the monitoring data packet into a preset fault prediction model to obtain an output result specifically includes: acquiring the temperature data and the voltage data; determining a temperature value and a voltage value according to the temperature data and the voltage data; judging whether the temperature value is within a preset temperature range, and if the temperature value is not within the preset temperature range, outputting a first result, wherein the first result is used for indicating that the temperature data is abnormal; judging whether the voltage value is within a preset voltage range, and if the voltage value is not within the preset voltage range, outputting a second result, wherein the second result is used for indicating that the voltage data is abnormal; and fusing the first result and the second result to obtain the output result.
By adopting the technical scheme, the server firstly acquires the temperature data and the voltage data, so that the temperature value and the voltage value are calculated according to the temperature data and the voltage data. Next, the server will determine whether the temperature value is within a preset temperature range, and when the temperature value is not within the preset temperature range, the server will output a first result. Meanwhile, the server also judges whether the voltage value is within a preset voltage range, and when the voltage value is not within the preset voltage range, the server outputs a second result. And finally, the server fuses the first result and the second result, so that an output result is obtained. Therefore, the server monitors abnormal temperature and abnormal voltage, so that the purpose of predicting the memory bank faults caused by temperature abnormality and/or voltage abnormality is achieved, and the prediction accuracy of the memory bank faults is improved conveniently.
Optionally, generating the fault prediction information based on the output result and combined with the historical fault information specifically includes: acquiring historical fault information, wherein the historical fault information comprises a fault phenomenon set; identifying the output result, and determining a first fault phenomenon, wherein the first fault phenomenon is one or two of temperature data abnormality and voltage data abnormality; judging whether the first fault phenomenon exists in the fault phenomenon set or not; and if the first fault phenomenon exists in the fault phenomenon set, generating first fault prediction information, wherein the first fault prediction information comprises a first fault probability.
By adopting the technical scheme, the server firstly acquires the historical fault information and then identifies the output result so as to determine the first fault phenomenon. Next, the server will determine whether the first failure phenomenon exists in the failure phenomenon set. When the first failure phenomenon exists in the failure phenomenon set, the server generates first failure prediction information and correspondingly generates a first failure probability. Therefore, the server achieves the aim of further determining the first failure probability according to the historical failure information, and is beneficial to improving the accuracy of the first failure probability.
Optionally, training a preset failure prediction model before the monitoring data packet is input into the preset failure prediction model; the training of the preset fault prediction model specifically includes: acquiring training information, wherein the training information comprises a monitoring data packet and fault prediction information; inputting the training information into a self-adaptive feature fusion network for training to obtain a first training result; superposing and standardizing the first training result and the training information to obtain a second training result; inputting the second training result into the self-adaptive feature fusion network for processing to obtain a third training result; and superposing and standardizing the third training result and the second training result until the training information similarity matrix is output, wherein the training information similarity matrix meets a preset logistic regression condition.
By adopting the technical scheme, the server trains the preset fault prediction model before inputting the monitoring data packet into the preset fault prediction model. Firstly, a server acquires training information, and then the training information is input into a self-adaptive feature fusion network for training, so that a first training result is obtained. And then, the server performs superposition and standardization processing on the first training result and the training information to obtain a second training result. And secondly, the server inputs the second training result into the self-adaptive feature fusion network for processing, so that a third training result is obtained. And finally, the server performs superposition and standardization processing on the third training result and the second training result until a training information similarity matrix is output. Therefore, through the continuous training and processing process, the accuracy and stability of the model can be improved, so that the model can be better adapted to different data conditions and can be effectively predicted and analyzed.
Optionally, if the first fault phenomenon does not exist in the fault phenomenon set, generating second fault prediction information, where the second fault prediction information includes a second fault probability, and the second fault probability is lower than the first fault probability.
By adopting the technical scheme, when the first fault phenomenon does not exist in the fault phenomenon set, the server generates second fault prediction information and generates corresponding second fault probability which is lower than the first fault probability. Therefore, the fault probabilities of different levels are provided, management staff can be reminded to carry out corresponding memory bank inspection to different degrees, and the timeliness and the robustness of the predicted data are improved conveniently.
Optionally, the monitoring data packet further includes log data, and the inputting the monitoring data packet into a preset fault prediction model to obtain an output result specifically further includes: according to a multi-scale channel attention mechanism, carrying out multi-scale feature extraction on the log data to obtain multi-scale feature information; and inputting the multi-scale characteristic information into the preset fault prediction model to obtain a third result, wherein the output result comprises the third result.
By adopting the technical scheme, when the monitoring data packet comprises log data, the server performs multi-scale feature extraction on the log data according to a multi-scale channel attention mechanism, so as to obtain multi-scale feature information. And finally, the server inputs the multi-scale characteristic information into a preset fault prediction model to obtain a third result. Therefore, the server can comprehensively and timely compare the abnormal log data through real-time detection of the log data, so that the prediction of faults of the memory bank is facilitated, and meanwhile, the prediction precision is improved.
Optionally, the inputting the multi-scale feature information into the preset fault prediction model to obtain a third result specifically includes: calculating a similarity value between the multi-scale feature information and preset abnormal feature information, wherein the preset fault prediction model is pre-stored with a plurality of types of preset abnormal feature information; and comparing the similarity value with a preset similarity threshold value to obtain the third result, wherein the preset similarity threshold value is determined by a fault log set in the historical fault information.
By adopting the technical scheme, the server firstly calculates the similarity value between the multi-scale characteristic information and the abnormal characteristic information and compares the similarity value with a preset similarity threshold value, so that a third result is obtained. Therefore, the server can realize the fine analysis of the abnormal log data, so that the comprehensive prediction of the faults of the memory bank is improved, and the accuracy of the prediction of the faults of the memory bank is improved conveniently.
Optionally, generating the fault prediction information based on the output result and combined with the historical fault information specifically further includes: and if the third result indicates that the similarity value is greater than or equal to the preset similarity threshold value, generating third fault prediction information, wherein the third fault prediction information comprises third fault probability, and the third fault probability is higher than the first fault probability.
By adopting the technical scheme, when the third result output by the preset fault prediction model indicates that the similarity value is greater than or equal to the preset similarity threshold, the server generates third fault prediction information and generates corresponding third fault probability, and the third fault probability is higher than the first fault probability. Thus, the historical fault information can provide valuable experience and knowledge to help predict faults that may occur in the future. Meanwhile, the high-probability fault prediction information has more reference value, and can help operators to better make decisions and take corresponding measures. The accuracy and the reliability of fault prediction are improved, potential faults can be found in advance, preventive measures can be timely taken, loss and influence caused by the faults are reduced, and the reliability and the stability of the system are improved.
In a second aspect of the present application, a memory bank fault prediction device is provided, where the memory bank fault prediction device is a server, and the server includes an acquisition module and a processing module, where the acquisition module is configured to acquire a monitoring data packet of a memory bank; the processing module is used for inputting the monitoring data packet into a preset fault prediction model to obtain an output result; the processing module is further configured to generate fault prediction information based on the output result and in combination with historical fault information, where the fault prediction information is used to indicate that a potential fault exists in the memory bank.
In a third aspect of the application there is provided an electronic device comprising a processor, a memory for storing instructions, a user interface and a network interface, both for communicating to other devices, the processor being for executing instructions stored in the memory to cause the electronic device to perform a method as described above.
In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
1. the server firstly acquires the monitoring data packet of the memory bank, and then inputs the monitoring data packet into a preset fault prediction model, so that an output result of the preset fault prediction model is obtained. Next, the server will combine the historical fault information based on the output results to generate the fault prediction information. Compared with the related art, the method reduces the probability of deviation of the detection data of the peripheral tool, predicts the faults of the memory bank according to the fault prediction information, and is convenient for improving the accuracy of the prediction of the faults of the memory bank;
2. the server performs real-time monitoring on temperature data, voltage data and log data, combines historical fault phenomena and fault logs, performs identification analysis by adopting a preset fault prediction model, and improves the accuracy and stability of the model through training, so that the model can be better adapted to different data conditions and effectively predict and analyze, and greatly helps the server to predict the faults of the memory bank;
3. The server in combination with the historical fault information can provide valuable experience and knowledge to help predict future faults that may occur. Meanwhile, the high-probability fault prediction information has more reference value, and can help operators to better make decisions and take corresponding measures. The accuracy and the reliability of fault prediction are improved, potential faults can be found in advance, preventive measures can be timely taken, loss and influence caused by the faults are reduced, and the reliability and the stability of the system are improved.
Drawings
Fig. 1 is a flow chart of a memory bank fault prediction method according to an embodiment of the present application.
Fig. 2 is a schematic block diagram of a memory bank failure prediction apparatus according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals illustrate: 21. an acquisition module; 22. a processing module; 31. a processor; 32. a communication bus; 33. a user interface; 34. a network interface; 35. a memory.
Detailed Description
In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.
In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.
In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Before describing embodiments of the present application, some terms involved in the embodiments of the present application will be first defined and described.
Channel attention mechanism: it is the attention mechanism (attention) that aims to direct computing resources towards the most informative part of the input signal. Through the combined use with a threshold function (such as softmax, sigmoid), the model is trained in a mode that the effective feature map has high weight, the ineffective feature map has small weight or the effect is small by carrying out back propagation according to the final loss through the network to learn the feature weights, so that a better result is achieved. The channel attention mechanism can dynamically adjust the characteristics of each channel according to the input, and the representation capability of the network is enhanced.
With the rapid development of internet services, the availability of infrastructure is also receiving increasing attention from the industry. However, hardware failure has been a ubiquitous phenomenon, and the loss due to hardware failure is often enormous. Among the various components of the server, memory bank failures are the second most common type of hardware failure, in addition to hard disk failures. And the number of the memory banks of the server is large, and the most serious consequence caused by the failure of the memory banks is that the system is crashed and the server is down, which are unacceptable for upper-layer business.
Therefore, it is necessary to predict the failure of the memory bank. At present, fault prediction for a memory bank mainly depends on some third-party peripherals and detection tools, but the tools have uncontrollable data detection risks due to poor stability, so that deviation of detection data occurs, and the accuracy of predicting the memory bank faults is low.
In order to solve the above-mentioned related technical problems, the present application provides a memory bank fault prediction method, and referring to fig. 1, fig. 1 is a flow chart of a memory bank fault prediction method according to an embodiment of the present application. The memory bank fault prediction method is applied to a server and comprises the following steps of S110 to S130:
s110, acquiring a monitoring data packet of the memory bank.
Specifically, the server itself will have some interfaces, such as an IPMI interface (Intelligent Platform Management Interface), before predicting memory bank failures. IPMI is a hardware management interface standard, through which a server can acquire various hardware information from the server. The server can acquire the monitoring data packet of the memory bank through the interface. And, the server accesses and extracts the monitoring data in the monitoring data packet by using a command line or Web interface. In addition, the server can also acquire the monitoring data packet sent by the sensor. Each monitoring data of the memory bank is stored in the monitoring data packet, and is obtained through real-time monitoring. For example, the monitoring data packet may include temperature data, voltage data, clock frequency data, bank capacity and usage data, fault report data, event log data, read-write speed data, and the like.
In the embodiment of the application, the server is a server for managing the memory bank and the monitoring data packet and is used for providing background service, and the server can be a server, a server cluster formed by a plurality of servers or a cloud computing service center.
S120, inputting the monitoring data packet into a preset fault prediction model to obtain an output result.
Specifically, after the server acquires the monitoring data packet, the monitoring data packet is input into a preset fault prediction model, so that an output result is obtained. The method comprises the steps of establishing a preset fault prediction model in advance, wherein the preset fault prediction model is used for analyzing and predicting potential faults of a memory bank, and the preset fault prediction model is a transducer model. The output result refers to the output result of the preset fault prediction model after analysis and judgment by utilizing the corresponding relation stored in the model according to the content in the monitoring data packet. The output result can reflect the normal or abnormal monitoring data, so that the influence of the abnormal monitoring data on the memory bank is conveniently analyzed, and the real-time prediction of the potential faults of the memory bank by the preset fault prediction model is facilitated.
In one possible implementation manner, the monitoring data packet includes temperature data and voltage data, and the monitoring data packet is input into a preset fault prediction model to obtain an output result, which specifically includes: acquiring temperature data and voltage data; determining a temperature value and a voltage value according to the temperature data and the voltage data; judging whether the temperature value is within a preset temperature range, and if the temperature value is not within the preset temperature range, outputting a first result, wherein the first result is used for indicating that the temperature data is abnormal; judging whether the voltage value is within a preset voltage range, and if the voltage value is not within the preset voltage range, outputting a second result, wherein the second result is used for indicating that the voltage data is abnormal; and fusing the first result and the second result to obtain an output result.
Specifically, when the temperature data and the voltage data are stored in the monitoring data packet, the server can quickly determine the temperature value and the voltage value according to the temperature data and the voltage data. The server will then determine if the temperature value is within a preset temperature range and if the voltage value is within a preset voltage range. When the temperature value is not in the preset temperature range, outputting a first result by a preset fault prediction model; when the voltage value is not in the preset voltage range, the preset fault prediction model outputs a second result. And finally, the server fuses the first result and the second result, so that an output result is obtained. The server analyzes the storage basic information of the memory strip, so as to determine the normal working temperature range and the working voltage range of the memory strip. The first result is used to indicate that the temperature is abnormal, e.g., overheating of the memory bank may cause a malfunction. The second result is used to indicate that the voltage is abnormal, for example, when the voltage received by the memory bank is too large, the memory bank is not working. Therefore, it is necessary for the server to monitor and analyze the temperature and voltage.
S130, generating fault prediction information based on the output result and combining the historical fault information, wherein the fault prediction information is used for indicating that potential faults exist in the memory bank.
Specifically, the server firstly acquires a monitoring data packet of the memory bank, and then inputs the monitoring data packet into a preset fault prediction model, so that an output result of the preset fault prediction model is obtained. Next, the server will combine the historical fault information based on the output results to generate the fault prediction information. Compared with the related art, the method reduces the probability of deviation of the detection data of the peripheral tool, predicts the faults of the memory bank according to the fault prediction information, and is convenient for improving the accuracy of the prediction of the faults of the memory bank.
The output result includes output results corresponding to the plurality of monitoring data, for example, the output results include temperature abnormality, voltage abnormality, clock frequency abnormality, insufficient memory, excessive memory, and the like. The historical fault information refers to faults of the memory bank in the past history, and the historical fault information comprises fault time, fault reasons, fault phenomena, fault times, fault treatment measures and the like. The fault prediction information includes a fault probability and a fault type, for example, the fault prediction information may be "the memory bank is too high in temperature and liable to generate faults", or may be "note that the XX memory bank is liable to generate downtime faults of the same kind as 10:06:22 on 27 th month of 2021".
In one possible implementation manner, based on the output result and combined with the historical fault information, the fault prediction information is generated, which specifically includes: acquiring historical fault information, wherein the historical fault information comprises a fault phenomenon set; identifying an output result, and determining a first fault phenomenon, wherein the first fault phenomenon is one or two of temperature data abnormality and voltage data abnormality; judging whether the first fault phenomenon exists in a fault phenomenon set or not; if the first failure phenomenon exists in the failure phenomenon set, first failure prediction information is generated, and the first failure prediction information comprises first failure probability.
Specifically, the server generates the fault prediction information according to the output result and by combining the historical fault information in the following specific manner: the server first acquires historical fault information and then identifies the output result so as to determine a first fault phenomenon. When the monitoring data are temperature data and voltage data, the first fault phenomenon is one or two of temperature data abnormality and voltage data abnormality. Next, the server will determine whether the first failure phenomenon exists in the failure phenomenon set, and when the first failure phenomenon exists in the failure phenomenon set, the server will generate first failure prediction information. The fault phenomenon refers to an external phenomenon caused by abnormal data of a memory bank of a server or the memory bank, and a plurality of fault phenomena are stored in a fault phenomenon set. For example, memory sticks overheat, memory sticks vibrate widely, computer blue screens, files are damaged, applications crash, servers are restarted accidentally, and so on. Second, the first failure probability indicates a probability value of a failure predicted by the memory bank of the server, for example, the first failure probability is 80%.
In one possible implementation, if the first failure event is not present in the failure event set, second failure prediction information is generated, the second failure prediction information including a second failure probability, the second failure probability being lower than the first failure probability.
Specifically, when the first failure phenomenon does not exist in the failure phenomenon set, the server will generate second failure prediction information. The second fault prediction information includes a second fault probability, where the second fault probability is that a certain fault phenomenon does not exist in the historical fault phenomenon, that is, a potential new fault phenomenon of the memory bank. At this time, since a large number of failure phenomena occur in the past history, the server considers that the reliability of the failure phenomena is low, and sets the generated second failure probability lower than the first failure probability. For example, if there is an X fault in the memory bank and there is no X fault in the history fault, the fault probability is 65%.
In a possible implementation manner, the monitoring data packet further includes log data, and the monitoring data packet is input into a preset fault prediction model to obtain an output result, and specifically further includes: according to a multiscale channel attention mechanism, multiscale feature extraction is carried out on the log data to obtain multiscale feature information; and inputting the multi-scale characteristic information into a preset fault prediction model to obtain a third result, wherein the output result comprises the third result.
Specifically, the monitoring data packet obtained by the server further includes log data, the server inputs the log data into a preset fault prediction model, and the specific process of obtaining the output result is as follows: and the server extracts the multi-scale characteristics of the log data according to the multi-scale channel attention mechanism, so that multi-scale characteristic information is obtained. And finally, the server inputs the multi-scale characteristic information into a preset fault prediction model to obtain a third result. The multi-scale channel attention mechanism is to weight channels of feature information on multiple scales so as to strengthen important channel features and reduce useless channel information. In the embodiment of the application, the multi-scale channel attention mechanism can improve the expression capability and robustness of the characteristic information, thereby improving the analysis and processing performance of the model on log data.
In one possible implementation manner, the multi-scale feature information is input into a preset fault prediction model to obtain a third result, which specifically includes: calculating a similarity value between the multi-scale characteristic information and preset abnormal characteristic information, wherein a plurality of preset abnormal characteristic information are prestored in a preset fault prediction model; and comparing the similarity value with a preset similarity threshold value to obtain a third result, wherein the preset similarity threshold value is determined by a fault log set in the historical fault information.
Specifically, the server first calculates a similarity value between the multi-scale feature information and the preset abnormal feature information, and then compares the similarity value with a preset similarity threshold value, so as to obtain a third result. The server calculates the similarity value by adopting the Hamming similarity, the Hamming similarity can be quickly matched with corresponding preset abnormal characteristic information in large-scale data information, and the data accuracy is improved while the matching efficiency is ensured. The historical fault information comprises a fault log set, and in the monitoring data, the fault log is error checking and correcting, and is an index capable of reflecting faults of the memory bank most. The ECC function of the memory bank can detect and automatically correct bit errors in the memory. When the memory bank fails, the ECC records an error event and corrects it if necessary. Therefore, by monitoring whether the log data function works normally and recording error events, whether the memory bank has faults or not can be identified, and corresponding repair measures can be timely taken.
In one possible implementation manner, based on the output result and combined with the historical fault information, the method further includes: if the third result indicates that the similarity value is greater than or equal to the preset similarity threshold value, third fault prediction information is generated, wherein the third fault prediction information comprises third fault probability, and the third fault probability is higher than the first fault probability.
Specifically, when the server confirms that the third result indicates that the similarity value is greater than or equal to the preset similarity threshold, third fault prediction information is generated, and corresponding third fault probability is generated. Wherein the third probability of failure > the first probability of failure > the second probability of failure. Because the log data is an index of most reflecting the fault of the memory bank, the trust degree of the server to the log data is higher than that of other monitoring data. For example, the first failure probability is 80%, the second failure probability is 65%, and the third failure probability is 90%.
In one possible embodiment, the preset fault prediction model is trained before the monitoring data packet is input into the preset fault prediction model; training a preset fault prediction model, which specifically comprises the following steps: acquiring training information, wherein the training information comprises a monitoring data packet and fault prediction information, and inputting the training information into a self-adaptive feature fusion network for training to obtain a first training result; the first training result and the training information are overlapped and standardized to obtain a second training result; inputting the second training result into the self-adaptive feature fusion network for processing to obtain a third training result; and superposing and standardizing the third training result and the second training result until a training information similarity matrix is output, wherein the training information similarity matrix meets the preset logistic regression condition.
Specifically, the server trains the preset failure prediction model before inputting the monitoring data packet into the preset failure prediction model. Firstly, a server acquires training information, and then the training information is input into a self-adaptive feature fusion network for training, so that a first training result is obtained. And then, the server performs superposition and standardization processing on the first training result and the training information to obtain a second training result. And secondly, the server inputs the second training result into the self-adaptive feature fusion network for processing, so that a third training result is obtained. And finally, the server performs superposition and standardization processing on the third training result and the second training result until a training information similarity matrix is output.
The preset logistic regression condition is preset by a manager of the server. In the embodiment of the application, the conventional feature cascading or feature fusion method of each layer generally performs feature weighting, selection and fusion based on weights and rules designed empirically or manually. The adaptive feature fusion network is adopted, so that the model architecture and parameters can be learned and optimized through data adaptation, and the adaptability of the model to complex and variant scenes is improved. Therefore, through the continuous training and processing process, the accuracy and stability of the model can be improved, so that the model can be better adapted to different data conditions and can be effectively predicted and analyzed.
In one possible implementation, the server generates the failure prediction information before presenting the failure prediction information. Or the fault prediction information is sent to the user equipment, so that a server manager corresponding to the user equipment checks and maintains the memory bank of the server.
The application also provides a memory bank fault prediction device, referring to fig. 2, fig. 2 is a schematic block diagram of the memory bank fault prediction device according to the embodiment of the application. The memory bank fault prediction device is a server, and the server comprises an acquisition module 21 and a processing module 22, wherein the acquisition module 21 is used for acquiring a monitoring data packet of a memory bank; the processing module 22 is configured to input the monitoring data packet into a preset failure prediction model, so as to obtain an output result; the processing module 22 is further configured to generate fault prediction information based on the output result and in combination with the historical fault information, where the fault prediction information is used to indicate that a potential fault exists in the memory bank.
In one possible implementation manner, the monitoring data packet includes temperature data and voltage data, and the monitoring data packet is input into a preset fault prediction model to obtain an output result, which specifically includes: the acquisition module 21 acquires temperature data and voltage data; the processing module 22 determines a temperature value and a voltage value from the temperature data and the voltage data; the processing module 22 judges whether the temperature value is within a preset temperature range, and if the temperature value is not within the preset temperature range, outputs a first result, wherein the first result is used for indicating that the temperature data is abnormal; the processing module 22 judges whether the voltage value is within a preset voltage range, and if the voltage value is not within the preset voltage range, outputs a second result, wherein the second result is used for indicating that the voltage data is abnormal; the processing module 22 fuses the first result and the second result to obtain an output result.
In one possible implementation, the processing module 22 generates the fault prediction information based on the output result and in combination with the historical fault information, specifically includes: the acquisition module 21 acquires historical fault information including a set of fault phenomena; the processing module 22 identifies the output result and determines a first fault phenomenon, wherein the first fault phenomenon is one or two of temperature data abnormality and voltage data abnormality; the processing module 22 determines whether the first failure event exists in the failure event set; if the first failure event exists in the failure event set, the processing module 22 generates first failure prediction information including a first failure probability.
In one possible implementation, the preset fault prediction model is trained before the processing module 22 inputs the monitoring data packets into the preset fault prediction model; the processing module 22 trains a preset failure prediction model, specifically including: the acquisition module 21 acquires training information including a monitoring data packet and failure prediction information; the processing module 22 inputs the training information into the self-adaptive feature fusion network for training to obtain a first training result; the processing module 22 performs superposition and standardization processing on the first training result and the training information to obtain a second training result; the processing module 22 inputs the second training result into the adaptive feature fusion network to be processed, so as to obtain a third training result; the processing module 22 performs superposition and standardization processing on the third training result and the second training result until a training information similarity matrix is output, where the training information similarity matrix meets a preset logistic regression condition.
In one possible implementation, if the first failure event is not present in the set of failure events, the processing module 22 generates second failure prediction information including a second failure probability that is lower than the first failure probability.
In a possible implementation manner, the monitoring data packet further includes log data, and the processing module 22 inputs the monitoring data packet into a preset fault prediction model to obtain an output result, and specifically further includes: the processing module 22 performs multi-scale feature extraction on the log data according to a multi-scale channel attention mechanism to obtain multi-scale feature information; the processing module 22 inputs the multi-scale feature information into a preset failure prediction model to obtain a third result, and the output result includes the third result.
In one possible implementation, the processing module 22 inputs the multi-scale feature information into a preset failure prediction model to obtain a third result, specifically includes: the processing module 22 calculates a similarity value between the multi-scale feature information and preset abnormal feature information, and a plurality of preset abnormal feature information are prestored in a preset fault prediction model; the processing module 22 compares the similarity value with a preset similarity threshold value, which is determined by the fault log set in the historical fault information, to obtain the third result.
In one possible implementation, the processing module 22 generates the fault prediction information based on the output result and in combination with the historical fault information, and specifically further includes: if the third result indicates that the similarity value is greater than or equal to the preset similarity threshold, the processing module 22 generates third failure prediction information, where the third failure prediction information includes a third failure probability, and the third failure probability is higher than the first failure probability.
The application further provides an electronic device, and referring to fig. 3, fig. 3 is a schematic structural diagram of the electronic device according to an embodiment of the application. The electronic device may include: at least one processor 31, at least one network interface 34, a user interface 33, a memory 35, at least one communication bus 32.
Wherein the communication bus 32 is used to enable connected communication between these components.
The user interface 33 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 33 may further include a standard wired interface and a standard wireless interface.
The network interface 34 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 31 may comprise one or more processing cores. The processor 31 connects various parts within the overall server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 35, and invoking data stored in the memory 35. Alternatively, the processor 31 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 31 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 31 and may be implemented by a single chip.
The Memory 35 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 35 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 35 may be used to store instructions, programs, code sets, or instruction sets. The memory 35 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 35 may alternatively be at least one memory device located remotely from the aforementioned processor 31. As shown in fig. 3, an operating system, a network communication module, a user interface module, and an application program of a memory bank failure prediction method may be included in the memory 35 as a computer storage medium.
In the electronic device shown in fig. 3, the user interface 33 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 31 may be configured to invoke an application program in the memory 35 that stores a memory bank fault prediction method that, when executed by one or more processors, causes the electronic device to perform the method as in one or more of the embodiments described above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. A memory bank failure prediction method, applied to a server, comprising:
acquiring a monitoring data packet of a memory bank;
inputting the monitoring data packet into a preset fault prediction model to obtain an output result;
and generating fault prediction information based on the output result and combining with historical fault information, wherein the fault prediction information is used for indicating that potential faults exist in the memory bank.
2. The method for predicting a memory bank fault according to claim 1, wherein the monitoring data packet includes temperature data and voltage data, and the step of inputting the monitoring data packet into a preset fault prediction model to obtain an output result specifically includes:
acquiring the temperature data and the voltage data;
determining a temperature value and a voltage value according to the temperature data and the voltage data;
judging whether the temperature value is within a preset temperature range, and if the temperature value is not within the preset temperature range, outputting a first result, wherein the first result is used for indicating that the temperature data is abnormal;
judging whether the voltage value is within a preset voltage range, and if the voltage value is not within the preset voltage range, outputting a second result, wherein the second result is used for indicating that the voltage data is abnormal;
and fusing the first result and the second result to obtain the output result.
3. The memory bank fault prediction method according to claim 1, wherein the generating the fault prediction information based on the output result and in combination with the historical fault information specifically includes:
acquiring historical fault information, wherein the historical fault information comprises a fault phenomenon set;
Identifying the output result, and determining a first fault phenomenon, wherein the first fault phenomenon is one or two of temperature data abnormality and voltage data abnormality;
judging whether the first fault phenomenon exists in the fault phenomenon set or not;
and if the first fault phenomenon exists in the fault phenomenon set, generating first fault prediction information, wherein the first fault prediction information comprises a first fault probability.
4. The memory bank failure prediction method according to claim 1, wherein the preset failure prediction model is trained before the monitoring data packet is input into the preset failure prediction model; the training of the preset fault prediction model specifically includes:
acquiring training information, wherein the training information comprises a monitoring data packet and fault prediction information;
inputting the training information into a self-adaptive feature fusion network for training to obtain a first training result;
superposing and standardizing the first training result and the training information to obtain a second training result;
inputting the second training result into the self-adaptive feature fusion network for processing to obtain a third training result;
And superposing and standardizing the third training result and the second training result until the training information similarity matrix is output, wherein the training information similarity matrix meets a preset logistic regression condition.
5. The memory bank failure prediction method according to claim 3, further comprising:
and if the first fault phenomenon does not exist in the fault phenomenon set, generating second fault prediction information, wherein the second fault prediction information comprises second fault probability which is lower than the first fault probability.
6. The method for predicting a memory bank fault according to claim 2, wherein the monitoring data packet further includes log data, and the inputting the monitoring data packet into a preset fault prediction model, to obtain an output result, specifically further includes:
according to a multi-scale channel attention mechanism, carrying out multi-scale feature extraction on the log data to obtain multi-scale feature information;
and inputting the multi-scale characteristic information into the preset fault prediction model to obtain a third result, wherein the output result comprises the third result.
7. The method for predicting a memory bank fault according to claim 6, wherein the inputting the multi-scale feature information into the preset fault prediction model to obtain a third result specifically includes:
Calculating a similarity value between the multi-scale feature information and preset abnormal feature information, wherein the preset fault prediction model is pre-stored with a plurality of types of preset abnormal feature information;
and comparing the similarity value with a preset similarity threshold value to obtain the third result, wherein the preset similarity threshold value is determined by a fault log set in the historical fault information.
8. The method for predicting a memory bank failure according to claim 7, wherein generating failure prediction information based on the output result in combination with historical failure information, specifically further comprises:
and if the third result indicates that the similarity value is greater than or equal to the preset similarity threshold value, generating third fault prediction information, wherein the third fault prediction information comprises third fault probability, and the third fault probability is higher than the first fault probability.
9. A memory bank fault prediction device is characterized in that the memory bank fault prediction device is a server, the server comprises an acquisition module (21) and a processing module (22), wherein,
the acquisition module (21) is used for acquiring the monitoring data packet of the memory bank;
the processing module (22) is used for inputting the monitoring data packet into a preset fault prediction model to obtain an output result;
The processing module (22) is further configured to generate fault prediction information based on the output result and in combination with historical fault information, where the fault prediction information is used to indicate that a potential fault exists in the memory bank.
10. An electronic device, characterized in that the electronic device comprises a processor (31), a memory (35), a user interface (33) and a network interface (34), the memory (35) being adapted to store instructions, the user interface (33) and the network interface (34) being adapted to communicate to other devices, the processor (31) being adapted to execute the instructions stored in the memory (35) to cause the electronic device to perform the method according to any one of claims 1 to 8.
CN202310918660.7A 2023-07-25 2023-07-25 Memory bank fault prediction method and device and electronic equipment Pending CN116932324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310918660.7A CN116932324A (en) 2023-07-25 2023-07-25 Memory bank fault prediction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310918660.7A CN116932324A (en) 2023-07-25 2023-07-25 Memory bank fault prediction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116932324A true CN116932324A (en) 2023-10-24

Family

ID=88376899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310918660.7A Pending CN116932324A (en) 2023-07-25 2023-07-25 Memory bank fault prediction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116932324A (en)

Similar Documents

Publication Publication Date Title
US10387899B2 (en) Systems and methods for monitoring and analyzing computer and network activity
WO2022068645A1 (en) Database fault discovery method, apparatus, electronic device, and storage medium
US10067815B2 (en) Probabilistic prediction of software failure
US10762544B2 (en) Issue resolution utilizing feature mapping
US11805005B2 (en) Systems and methods for predictive assurance
US10613525B1 (en) Automated health assessment and outage prediction system
CN107924360A (en) Diagnosis frame in computing system
US9860109B2 (en) Automatic alert generation
US20190354913A1 (en) Method and system for quantifying quality of customer experience (cx) of an application
CN116502166A (en) Prediction method, device, equipment and medium based on other equipment data
CN113098715B (en) Information processing method, device, system, medium and computing equipment
CN116932324A (en) Memory bank fault prediction method and device and electronic equipment
CN115543665A (en) Memory reliability evaluation method and device and storage medium
CN114385398A (en) Request response state determination method, device, equipment and storage medium
JP2022037107A (en) Failure analysis device, failure analysis method, and failure analysis program
JP2007265244A (en) Performance monitoring device for web system
CN117149569A (en) Board running state early warning method and device and electronic equipment
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN111309585A (en) Log data testing method, device and system, electronic equipment and storage medium
US20230126193A1 (en) Predictive Remediation Action System
CN114996119B (en) Fault diagnosis method, fault diagnosis device, electronic device and storage medium
JP7425918B1 (en) Information processing device, information processing method and program
US20230359925A1 (en) Predictive Severity Matrix
US20240080332A1 (en) System and method for gathering, analyzing, and reporting global cybersecurity threats
US20230188408A1 (en) Enhanced analysis and remediation of network performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination